4. Problems on current DW’s
Source: TDWI 2009 BP Survey. 1752 responses from 417 respondents.
Analitik
Real-Time
Ölçeklenebilirlik
‘’Old’’ Stuff
Karışık iş yükü
‘’Modern’’ Stuff
6. SSD Disk Nedir?
• Solid-State Disk (SSD) Data depolama ünitesi
> Hard disk ile aynı elektriksel ve fiziksel paket
> Bir grup flash memory grubudur.
Geleneksel Hard Disk Solid State Disk
7. SSD Vs Geleneksel Hard Disk
Kurumsal SSD
(2009)
Kurumsal 15K HDD
(2009)
IOPs (4 KB) 10.000 100
Bant genişliği (MB/s) >225 >150
Random I/O geçikmesi
(sec)
Microseconds (10-4
) Milliseconds (10-3
)
Aktif Güç (watt) <2W 12-18W
Bekleme Durumunda Güç
(watt)
<1W 4-10W
Max Kapasite (GB) 256 GB 600 GB
• Hem elektrik kullanımı hem de performansta daha iyi.
8. Veri Ambarında en yüksek hız
Extreme Performance Appliance
• Teradata Blurr Teknolojisi üzerine
kurulu
> SSD teknolojisi ile yaratılmış ilk veri ambarı
platformu
> Endüstri lideri Teradata Veriambarı
platformunun performansını yükseltir.
• SSD “Kurumsal Kullanıma Hazır”
> Yüksek performans kazancı
> Yüksek süreklilik
• Teradata 13
> Yüksek performans
• Teradata Platfrom Ailesinin bir üyesi
> Güvenilir donanım/yazılım ve özellikler
Blurr™ TechnologyBlurr™ Technology
9. • Blurr Teknolojisi yüksek
performas sağlar
> Saniyede 4 milyon I/O
– Diğerleri @ 1 milyon IOPS
> Saatte 9TB+ veri yükleme
– Diğerleri saatte 5TB
> Günümüz veriambarlarından 18 kat
daha hızlı
SSDSSD
Veri Ambarında en yüksek hız
Extreme Performance Appliance
10. Çok katmanlı hibrid mimaride SSD
(2011)
300GB
SAS
Drives
1 TB
SATA
Drives
Solid State
Drives
(SSD)
Geleneksel
Performans
Arşiv
Veri
Yüksek
Performans
Otomatikveritaşıma
Teradata Active
Enterprise Data
Warehouse
12. Source: Temporal Data and the Relational Model by C.J. Date, H. Darwen, N. Lorentzos
Temporal veritabanı nedir?
• Temporal veritabanı zaman boyutu için özel bir destek
sağlar.Tarihsel veriyi tutma, sorgulama ve güncelleme
için tasarlanmış özel bir yapı.
• Geleneksel veritabanı sistemleri– piyasada şuanda
varolan tüm sistemler dahil– bu anlamda temporal değil:
Aslında hiçbiri tam olarak temporal desteği
vermemektedir.
13. Önemli fonksiyonlar
• VALID TIME gerçek dünyada bir şeyin geçerli olduğu
zaman dönemi
• TRANSACTION TIME verinin veritabanına yazıldığı
zaman
• BI-TEMPORAL DATA valid time ve transaction time’ın
birlikte tutulduğu sistem
• Valid Time ve Transaction Time herhangi bir şey için aynı
olmak zorunda değil
14. Örnek
• Create table musteri_statu (
musteri_no integer,
statu Char(1),
valid_time period(date) not null as validtime);
• Insert
insert into musteri_statu values (1,’A’period '(2010-01-01,UNTIL_CHANGED)');
• Update
update musteri_statu set statu=‘B’ where musteri_no=1;
musteri_no statu valid_time
1 A (2010-01-01,9999-12-31)
musteri_no statu valid_time
1 A (2010-01-01,2010-11-23)
1 B (2010-11-23,9999-12-31)
15. Örnek
01/01/2010 23/11/2010
31/12/9999
(eternity)
Musteri
No = 1
Musteri
No = 1
musteri_no statu valid_time
1 A (2010-01-01,2010-11-23)
1 B (2010-11-23,9999-12-31)
Select * from musteri_status where
musteri_no=1 and
validtime as of DATE ‘2010-11-21’
Select * from musteri_status where
musteri_no=1 and
validtime as of DATE ‘2010-11-25’
16. Temporal verinin faydaları
Teknik olarak
• Sorgu karmaşıklığının azaltılması
> Zaman yönetimi yerine veri içerisindeki ana değerlerle çalışma
> Geliştirme ve bakım sürelerinde azalma
İş faydaları
• İş Zekası uygulamalarının veri derinliğinin arttırılması
> Belirli bir andaki verinin sorgulanması ve olay zincirinin yönetimi
> Tarihsel veri içerisinde daha önceden bulunmasında zorlanılan iş
faydalarının sağlanması
> Olayların tarihsel yönetiminin kolaylığı
18. Star Şema
İş
Zekası
ÇözümleriKarar
Destek
Sistemleri
3NF Schema
Pazarlama YönetimAnalist Finans
İş Zekasında çok boyutlu modelleme
Uygulama Yöntemleri
• Üçüncü Normal Form
> Ad-hoc analiz için
> Esnek ve genişletilebilir
> En detay veri sağlar
> Farklı açılardan iş görüntüsü
sağlar
> Karar destek sistemleri için
önemli
• Star/Snowflake Şema
> Daha önceden belirlenmiş iş
soruları optimize
> Önceden özetlenmiş veri sağlar
> Raporlama ihtiyaçlarını modeller
> İş Zekası ve OLAP raporlama için
hazırlanmış
19. Yönetim Analist Satış
Fiziksel küler
(MOLAP)
Çok boyutlu görüntü
İş Zekası görüntüsü
• Verinin çok boyutlu görüntüsü
• Özetlenmiş veri
• Detay veriye drill down
OLAP Uygulamaları
• Farklı katmanlarda detay veri
• Verinin özetlenmesine dayalı
mimari
Teradata Veri Ambarı
• Drill-through
• Drill-down
• Daha hızlı küp yapısı
• Güçlü OLAP motoru
ilişksel OLAP
(ROLAP)
Hibrid OLAP
(HOLAP)
0
1101
1001
0101
1011
1
0
1
İş Zekası uygulama şekilleri
Finans
20. MOLAP ROLAP Karşılaştırması
Molap
• Veri özetlenerek başka bir
platforma çıkılır
• Fazladan disk alanı
gerektirir
• Veri güncel değildir
• Baştan yaratılması zaman
alır
• Ölçeklenebilirlik problemleri
• Detay veri analizi zordur
• Bakım yönetim problemleri
Teradata Rolap
• Özet veri bir indeks olarak
tutulur
• Disk alanı ihtiyacı
minimumdur.
• Veri veritabanına girdiği anda
analiz için hazırdır.
• Yaratılması kolaydır.
• Ölçeklenebilir
• Detay veri sorgulanı, optimizer
tarafından kullanılır
• Bakım yönetim problemleri
yoktur
21. Multi-dimensional View
(BI Tools)
PI, PPI, AJI, Views
Teradata üzerinde İş Zekası
• Teradata Faydaları
> Veri derinliği artat
> Analitik veri sorgulama süresi azalır
> Veri geçikmesi ortadan kalkar
> Disk alanı ihtiyacı azalır
> Ölçeklenebilirlik sorunları ortadan
kalkar
> Düşük toplam sahip olma maliyeti
> İş yükü Teradata tarafından karşılanır.
22. Bankacılık – Lloyds TSB
Problem
• Karışık ve koordine olmayan bir İş Zekası platformu
• Ölçeklenebilir , çok teknik bilgi gerektirmeyen ve
esnek bir çözüm arayışı
• Analist SQL cümlecikleri yazmakla çok fazla vakit
kaybetmekte
• Standart ve günlük raporları yaratmak için yüksek
efor.
• İş birimlerinin veriye doğrudan olmayışı
Çözüm
• ROLAP modunda MS Sql ServerAnalysis Services
2005
• Teradata Aggregate Join Indexes ile MOLAP
fonksiyonu
• Veri duplikasyonu ortadan kalkar
• Büyük veri transferi yapılmaz
Sonuçlar
• Tek boyut üzerinde sorgular <1 sn vs. 40 sn
• Tüm boyut üzerinde sorgular 2 sn vs. 7 dakika
• Diğer tüm sorgular saniyeler içerisinde
Etki
• Raporlama süresi dakikalardan
saniyelere
• Kolay kullanım
• İstenilen tüm veriye kolay erişim,
zaman kazanımı
• Tutarlı raporlama ortamı
• Az kişi ile daha çok iş
• Daha düşük Toplam Sahip Olma
Maliyeti
23. Telekomünikasyon – Verizon
Problem
• Ölçeklenebilirlik problemleri
• İş tarafının artan isteklerine cevap verememe
• Varolan mevcut altyapı, son kullanıcı isteklerine
cevap vermede yetersiz.
Çözüm
• ROLAP Küpler: En efektif çözüm
• Kullanılabilir altyapı
• AJI
• Farklı işyüklerinin yönetimi
• Konfigure analitik servisler
Sonuç
• Azalan depolama ünitesi kullanımı
• Zaman Kazancı
• Yüksek performans
• Artan iş faydası
• Azalan donanım gereksinimi
• Azalan geliştirme ve bakım süreleri
Etki
• Yüksek sayıda boyut ve ölçüm
desteği
• Model esnek ve ölçeklenebilir
• Önceden belirlenemeyen trendler
şimdi kolaylıkla bulunabiliyor.
• Veriambarı içerisinde bulunan
tüm veriye drill down olanağı
Kübün yaratılma süresi:
13 saatten 3 dakikaya
Küp Boyutu:
22.4 GB’tan 10GB altına
Veri Detayı:
Aylık özetten to Günlü eğ
25. Teradata ile SAS işbirliği veri madenciliğini
veritabanı içine taşıdı
Veri madenciliğinin en genel kullanımında veri, veri ambarından SAS platformuna çıkılır.
Sorunlar • Tekrarlayan veri ve yüksek altyapı maliyeti
• Analistlerin çoğu zamanı iş faydası düşük olan veri
hazırlama süreci ile geçer.
• Verinin taşınması ve veri kalitesi problemleri
Eski
Analytic
Modelling
SAS
ScoringData
Yeni
SAS
Scoring
Data
SAS
Modelling
Analytic
Modelling
Faydalar
SAS ile Teradata tüm bu süreci veri ambarı içerisine taşıdı
• Maliyet Avantajı: Tek sistem ayrı bir donanım
gereksinimi yok
• Analistler veri hazırlama yerine veri analizi
yaparlar
• Yüksek hızda skorlama
26. RekabetAvantajı
Veri HazırlamaVeri Hazırlama
Veri KeşfiVeri Keşfi
İşin Kavranmasıİşin Kavranması
Veri ToplamaVeri Toplama
OperaOperasysyononeell
AnalAnalitik Bakışitik Bakış
Zaman
ModelModelinin
Uygulamaya AlınmasıUygulamaya Alınması
SAS ModelSAS Modelii
ModelModel
GeliştirmeGeliştirme
ModelModel ve Verive Veri
DönüştürmeDönüştürme
Gelişim Sürecinin %70’i
Model ADS
Skor ADS
Geleneksel Analitik Süreç Zinciri
27. RekabetAvantajı
İşin Kavranmasıİşin Kavranması
OperaOperasyonel Analitik Bakışsyonel Analitik Bakış
Uygulanabilir İş Zekası
ModelModel
GeliştirmeGeliştirme
Zaman
Model Geliştirme ve Uygulama
Günlerden Saatlere
Skor ADS
SAS ModelSAS Modelii
“In-Database” Analitik Süreç Zinciri
Select musteri_degeri_skorla (1,’ABC’,2000,......);
&lt;number&gt;
In the next few releases of Teradata software and hardware we are moving to an architecture that is both cheaper for our customers and faster because of multi-temperature data. Its cheaper because we allow clients to mix disk drives attached to a server node. Some are faster with less data in them so that the ratio of I/Os per second to gigabytes favors faster response time per gigabyte. There will also be larger disk drives that cost less per gigabyte but have a less attractive I/Os per gigabyte.
As we move forward, we will also include solid state disk units that will be able to feed data to the incredibly fast Intel microprocessors at speeds more in line with silicon than spinning disks.
The key to this slide if the automated migration. Clients can do some of this today on non-Teradata systems, but it brings a host of problems. At minimum, its labor intensive requiring constant care and attention from the DBAs. But at its worst, clients can easily un-balance their configurations making query response times inconsistent, hard to troubleshoot, and horrible to do capacity planning.
Okay. So let’s start off with temporal. I am going to spend a little bit of extraordinary time on temporal, because it is going to be a differentiator for us. The subject of temporal databases has been researched and studied since about 1980. We feel the time has come to enable our customers to start exploiting the full value of the time dimension.
There are about 2,000 different publications on the topic, as well as several textbooks, and within those publications and textbooks the definition of what a temporal database varies somewhat. It doesn’t vary a lot, but it seems to be different in some ways, so I decided to choose one of the leading textbooks, as you can see on the slide there, by D.J. Date, Darwen, and Lorentzos, names you probably are familiar with, to use their definition.
Temporal database systems are systems that include special support for the time dimension. In other words, they are systems that provide special facilities for storing, querying, and updating historical and/or future data. Conventional database systems, including all of the existing mainstream products today, are not temporal in the sense that they provide essentially no special support for temporal data.
Now, you may think, well, people are doing temporal types of work with databases, the relational database as it is today. True, but there is no inherent native support for temporal processing within any of the commercial databases today.
[Enter any extra notes here; leave the item ID line at the bottom]
Avitage Item ID: {{BDEDC6D3-CE55-496C-87F6-231115557EB9}}
&lt;number&gt;
Now we’ll look at some of the key concepts for temporal, and I’m just giving you the absolute, absolute basic minimal, because as you begin to talk about temporal, it can become very complex and confusing as each word is spoken. So I’m going to give you the two or three things that you absolutely need to grasp before you can understand anything about temporal support.
Temporal data is stored in a temporal database, and it’s different from the data stored in a non-temporal database in that a time period attached to the data expresses when it is valid or the actual moment that it was stored in the database. Those are two different things. Conventional databases consider the data stored in them to be valid at time instant now, and, you know, they do not keep track of past or future database dates, but by attaching a time period to the data, it becomes possible for different database dates.
The first step towards a temporal database is to time-stamp the data row of the time period. This allows the distinction of different database dates. Now, what time period do we store in these time stamps? There are mainly two different notions of time which are relevant to temporal databases. One is called valid time, and the other one is called transaction time.
Valid time denotes the time period during which a fact is true with respect to the real world. On the other time, transaction time, as you can probably guess, is the time that a fact is entered into the database, and combining the two results in what is known as bi-temporal data. So if you come away from this understanding what valid time, transaction time, and bi-temporal data is, that will be excellent.
Now, note that these two time periods do not have to be the same for a single fact. Imagine that we come up with a temporal database storing data about the 18th Century. The valid time of these facts is somewhere between 1700 and 1799, whereas the transaction time starts when we insert the facts into the database. Those are the two key concepts I’d like you to try and remember.
[Enter any extra notes here; leave the item ID line at the bottom]
Avitage Item ID: {{F96E2399-7E29-4D37-B20F-5EC499F5145A}}
&lt;number&gt;
Let’s move to how the current situation changes with temporal support. In other words, once we start supporting temporal processing in the database, over time how are things going to improve? Well, there’s benefits in the technical area, and there’s benefits in the business area.
The technical area comes down to merely a reduction in complexity. The cost of developing queries and maintaining temporal data is going to go down compared to as it is today. Development and maintenance productivity will improve. Therefore, the cost and effort of monitoring and maintaining temporal data chains will go down, so it’s one of the classic values that we bring to customers, reducing cost and complexity. On the business side, we’ll now be able to exploit enhanced time dimensions, enabling a ‘chain of events’ and ‘point in time’ analytics.
So we’ve had business intelligence around for quite a while. It’s getting passé, almost. We’re now interested in expanding the time dimension of business intelligence, and with temporal support being easier to implement and understand with database temporal support, it will then become easier and more attractive to start doing business intelligence using temporal historical data. It will be easier to reconstruct historical transaction details, be able to uncover previously undetected business values from historical data, and ultimately, like any business wants to do, to discover new ways to perform and compete.
[Enter any extra notes here; leave the item ID line at the bottom]
Avitage Item ID: {{E7FEAD29-F462-447E-A3D5-B50DBF7125B5}}
&lt;number&gt;
3rd normal form is still the recommended first step for data modeling and is ideal for decision support solutions. However in order to meet demands for the BI class of applications we must consider using star/snowflake schemas to better match the BI implementation and to be able to support response time requirements.
&lt;number&gt;
All OLAP technologies provide the user with an intuitive multi-dimensional view, allowing the user to analyze data based on multiple dimensions and at varying levels of detail, however, the underlying implementation can vary.
The implementations typically vary based on two key factors: The type of data: the depth or levels of hierarchy within each dimension. Aggregated data vs. detailed data.
Data store: Where the physical data resides. Is it in a physical cube on a server or leverage the data in a relational database.
Let’s quickly examine the different implementation.
First we have the physical cube also referred to as MOLAP (Multi-dimensional OLAP).
All dimensions and levels of the OLAP content are extracted, pre-calculated and stored in a cube - multi-dimensional database for analysis. In this case, Teradata is typically used as a data store and the data is extracted and moved into an OLAP server.
The strengths of MOLAP is the quickness in response because the data is pre-calculated and available on the server, but that comes with a cost: maintenance and scalability
Moving toward the right, is the ROLAP implementation. In this implementation relies heavily on the database engine. User requests are submitted to the OLAP engine in the mid-tier which may house metadata on the server, however the processing is done in the database. The OLAP engine will convert the request into a SQL query that’s processed in the database.
The benefits of the ROLAP implementation are lower maintenance cost and scalability in terms of breadth (in terms of more dimensions) and depth of analysis (more levels).
An then there’s the hybrid implementation HOLAP. HOLAP leverages the strengths of MOLAP and ROLAP, where some dimensions and levels of your OLAP content are extracted and stored in a cube for analysis and drills-down into a relational database for detail analysis. Teradata optimizations that are done for MOLAP and ROLAP can be applied to HOLAP. For our partner technology, this is the best compromise solution.
Our message is a complementary that we optimize OLAP for BI implementations and the recommended approach for most solutions is the ROLAP implementation.
&lt;number&gt;
Next let’s move the right and discuss the OLAP engine commonly referred to as the Relational OLAP engine.
In this implementation relies heavily on the database engine. OLAP request are submitted to the OLAP engine which may house metadata on the server, however the processing is done in the database. The OLAP engine will convert the request into a SQL query that’s processed in the database.
ROLAP: Virtual cubes powered by analysis in the database
OLAP request processed in the database
Teradata provides Aggregate Join Indexes (AJI) to optimize processing
The benefits of the ROLAP implementation is lower maintenance cost and scalability in terms of breadth and depth of analysis. Since the data is housed in the database, you can query large number of dimensions and drill down to the atomic level.
Benefits
Unlimited analysis
Application neutral
Current data
Ease of management
The drawbacks were performance and response back to the user, however with Teradata’s optimizations, we’ve can dramatically improve performance of the ROLAP technique. Teradata leverages the aggregate join indexes (AJI) to precalculate commonly performed aggregations, so the bulk of the calculations are already done waiting in the database.
Challenges
EDW dependency
Responses in few seconds
ROLAP processing in Teradata Database
Use JI/AJI capabilities
Very quick cube builds and optional automatic updates
Extended “cube” size and numbers of dimensions
Analysis freedom
Very fast query response time
Most OLAP front ends can issue ROLAP queries
On-going JI capabilities and flexibility improvement
PPI for non-compressed JI
Allow JI and triggers on the same table (ROLAP and event processing on same transaction data)
Query optimizations
&lt;number&gt;
&lt;number&gt;
&lt;number&gt;
Through the SAS and Teradata partnership utilizing the Teradata Data Set Builder for SAS, we eliminate the need to perform the first three steps.
&lt;number&gt;
The SAS Scoring Accelerator for Teradata automatically translates the model to SQL and pushes the model into the Teradata Database where the scoring happens directly against the most recent data available in the Teradata environment.
By implementing this unique set of capabilities from SAS and Teradata, your customer will be able to deploy analytics and score their data much faster since the SAS model now runs in the Teradata Database.
The result is much faster time to actionable business insight.
&lt;number&gt;
Teradata’s goal is to help you transform customer insights into actions.
We do that by providing world-class database technology to create both Strategic and Operational intelligence. By strategic intelligence, we mean the insights that back-office workers create by using BI tools and analytic packages to build reports, write ad-hoc analytical queries, create predictive models, and then use these insights to create pre-planned customer dialogues and plans. Cycle times for developing these kinds of strategic insight activities take typically hours or days, sometimes weeks.
By contrast, operational intelligence is the application of strategic intelligence at the front-line, at the point of customer contact. It is the application of historical context, current state, and predictive insights like the next best offer, specialized for one specific customer – the one standing in front of a bank teller, or at a check-in kiosk, or checking out of a retail store on the web or in person. In this case, the operational systems sense inputs (what does the customer want), assess and compare the inputs to the dialogue, and use that to drive a timely and relevant response. Cycle times for operational intelligence are fast – each step of the response may take under a second. For example, it could be a dialogue sequence of customized screen pages for a call center agent to use, a sequence of web pages to be rendered for that customer as she clicks, or a sequence of screens that a bank teller might see. You will go around this cycle numerous times on the right within one session with a customer.
Then the customer walks away, and the results are captured and uploaded from the frontline system back into the database for further analysis. The insight cycle begins again, perhaps after hundreds or thousands of customers have interacted with the company. Changes are noted. By analyzing the data, refining the insights and dialogues, you can make them better for use the next time.