Internal Workshop on Open Source Software Community Project: Smart Grid Open Source Platform,
Supported by NIPA, Korea
- 2010/06/30
more info on @ http://nephee.or.kr/
1. Smart Grid Open Source Platform
과제 진행 및 이슈 사항
Du-Ho, Kim @ SKCC
1
2. Agenda
• I. SmartGrid Platform Architecture
• 1. Platform Overall Architecture
• 2. System Configurations
• 3. Input Data (PMU) Simulator
• 4. Input Data Collector
• 5. Cloud Data Storage
• 6. Distributed Database
• 7. Time-Series Data Analysis and Mining
• II. Development Schedule
• 1. Overall Schedule
• III. Current Issues
• 1. Input Data Simulator and Collector
• 2. Cloud Storage and Distributed DB
• 3. Data Analysis and Mining
2
4. 1. Platform Overall Architecture
Input Data Data Analysis
Collector & Mining
Input Data OpenPDC Collector Algorithms
(PMU) Simulator Input Action Output Power Grid Time-Series
Adaptor Adaptor Adaptor Algorithms Data Mining
PMU1 Statistics Search
Algorithms Algorithms
PMU2
Collector
PMU3 Data (Time- Distributed Computing
Agent Series
PMU4 Map & Reduce
Sorting)
Framework
PMU5
Time-Series Data (raw)
PMU6
Cloud Storage Index, Distributed DB
Mining
Meta Data handler Data Database
Cassandra MySQL
Mongo DB
Summary
Data
4
5. 2. System Configurations
nephee01 nephee02 nephee03
VM-PMU1: VM-PMU2:
Simulator Simulator
VM-1: OpenPDC
(Windows2008)
Name Node Name Node VM-2: OpenPDC
(primary) (2ndary) (Windows2008)
HDFS HDD HDFS HDD
nephee04 nephee05 nephee06 nephee07
VM-PMU3: VM-PMU4: VM-PMU5: VM-PMU6:
Simulator Simulator PMU Simulator PMU Simulator
VM-I:
Input Collector
DB
Data Node-1 Data Node-2 Data Node-3
(Cassandra/MySQL)
HDFS HDD HDFS HDD HDFS HDD
5
7. 3.1. How To Generate Simulator Source Data ?
IEEE C37.118-2005,
Power IEEE 1344-1995
Source PMU K-WAMS
K-WAMS
Format
Measured
data
K-WAMS to
C37.118/1344
Converter
• Real PMU Data /or
• Sample Data
Open Source Nephee Project
IEEE
C37.118- (Nephee) (Nephee)
2005, (Nephee) PMU data PMU data
IEEE 1344- PMU Simulator Concentrator Analyzer
1995 : input adapters : cloud platform
Format
data
7
8. 4. Input Data Collector
Cloud Storage
Meta Data handler
Input Data
Collector
PMU1
OpenPDC Collector
PMU2 Input Action Output
Adaptor Adaptor Adaptor
PMU3
PMU4 Distributed DB
Collector
PMU5 <or> Database
Data (Time-
PMU6 Agent Series Cassandra MySQL
Sorting)
Mongo DB
8
9. 4.1. OpenPDC Collector
Physical environment Logical Environment
NODE Input Adaptor Action Adaptor Output Adaptor
Device1
IA1
AA1 OA1
Device2 IA2
metadata
Service Service
IA1 AA1
Service Service
IA2 OA1
Visualization
&
Monitoring OpenPDC
9
10. 4.2. OpenPDC Architecture
Microsoft Family
OpenPDC
PMU
.d
PMU
PMU
Nephee Framework
Data Agent
(with OpenPDC)
OpenPDC Legacy
FTP
Data Hadoop /
Mining HDFS
10
11. 4.3. About OpenPDC
Open source project of SuperPDC
Application set for real-time time-series data
Processing and management system for fast
and continuous phasor data
Currently SuperPDC handles …
Space utilization rate of 1.5 GB/hr (36 GB a day)
Measurement archival rate of 150 million/hr (3.6 billion
a day)
120 online PMUs
1,850 defined measurements
11
12. 4.3. Chukwa / Scribe Collector
Input Data Collector (Chukwa)
Data Processing Processing
Post
Chukwa Chukwa HDFS Archive Chukwa Processing
File Demux Record File
Agent Collector Builder
(M&R)
(M&R) Rolling
Hadoop
SequenceFile
PMU1
PMU2
Cloud Storage Database
PMU3
HDFS Cassandra MySQL
PMU4
PMU5
PMU6
Input Data Collector (Scribe)
Scribe Client Local Server Central Server
Scribe Client Scribe Server Scribe Server
Scribe Client (local) (center)
[Central Server [Central Storage
Scribe Client Failure Case] Failure Case]
Scribe Client Local Log Local Log
Scribe Client (temp) (temp)
12
14. 6. Distributed Database
Input Data Data Analysis
Collector & Mining
Algorithms
Power Grid Time-Series
Algorithms Data Mining
Statistics Search
Algorithms Algorithms
Collector
(Time- Distributed Computing
Series
Map & Reduce
Sorting)
Framework
Time-Series Data (raw)
Cloud Storage Index, Distributed DB
Mining
Meta Data handler Data Database
Cassandra MySQL
Mongo DB
Summary
Data
14
15. 7. Time-Series Data Analysis and Mining
Data Analysis
& Mining
Algorithms
Power Grid Time-Series
Algorithms Data Mining
Statistics Search
Algorithms Algorithms
Distributed Computing
Map & Reduce
Framework Raw Data <key, val>
(Cloud Storage)
[training] (time-series) Training (Clustering, Meta Data Insertion
Input Signal Signature Extraction
SignatureExtraction
Signature Extraction Classification) (DB)
Database
[query] (time-series)
Search (Matching)
Input Signal Signature Extraction Results
15
16. 7.1. Hadoop Map & Reduce Framework
Task Tracker
Table A Map
Map
Task Map
Task Partition
Task Task Tracker
Tablet A-1 using key
Reduce
Table B
Task Tracker
Tablet A-2 Task
Map
Map Tablet B-1
Task Map
Task
Tablet A-3 Task Task Tracker
Tablet B-2
… Task Tracker Reduce
Task
Map
Tablet A-N Map
Task Map
Task
Task
Task assign to each node
Get
META Table Job Tracker
Tablet
List
Run on MapReduce framework
Write MapReduce function
16
18. 1. Overall Schedule
2010 / 5 2010 / 6 2010 / 7 2010 / 8 2010 / 9 2010 / 10
OpenPDC Architecture Analysis
Input Collector
1344/C37-118 Protocol Analysis
K-WAMS Review
PMU Simulator / Test Bed
Input Collection Test
Input Collector Design Input Collector Test
Cloud Storage/DB
HDFS Storage Analysis
Cloud Storage Design
Cloud Storage Develop
Cloud Storage Test
DB Survey and Test DB Development
Distributed DB Test
Map & Reduce Framework Algorithm Implementations (MR)
Data Analysis P/F
Time-Series Mining Algorithms
Data Analysis Platform Design
Data Analysis Platform Develop
Data Analysis Platform Test Demo
18
19. III. Current Issues
1. Input Data Simulator and Collector
2. Cloud Storage and Distributed DB
3. Data Analysis and Mining
19
20. 1. Issues: Input Data Simulator and Collector
A. Input Data Simulator Issues
• 실측 PMU data를 simulator의 입력으로 사용하는 문제
IEEE C37.118-2005, 1344-1995 format의 실측 또는 sample file을 사용할지?
• Simulator를 위한 입력 scenario들의 선택 문제
Power Grid의 PMU 입력 데이터의 측정으로부터 check 되어야 할 사항들은?
event check 부분과 연관되는 문제임
각 사항들에 대한 PMU signal들의 모습은?
e.g.) 5 secs 이내 voltage 값의 10% 변동, center frequency 값의 10%
변동 등
B. Input Data Collector Issues
• Microsoft Platform에서만 실행되는 OpenPDC의 활용 방안
저장된 입력 signal을 replay하는 simulator로 활용한다.
time-series input signal들에 대한 real-time event checker로 활용한다.
• OpenPDC의 출력으로부터 수집된 signal을 사용하거나 test 중인 input collector
들을 사용하는 방법을 모두 고려한다.
• Open Source Chukwa, Scribe, Honu를 사용하여 (준) 실시간 저장, 처리하는
mechanism을 구현 중이다.
20
21. 2. Issues: Cloud Storage and Distributed DB
A. Cloud Storage Issues
• 대용량 data의 실시간 저장 및 분석을 위해 cloud storage (HDFS)에 1차 저장,
시간/일/월별 정렬된 데이터를 2차 저장하도록 하고 있음
B. Distributed Database Issues
• Data Analysis and Mining 알고리즘들을 분산, 병렬 수행하여 처리된 결과에 대한
meta data, index 정보들을 DB에 저장하여 외부로부터의 query를 처리할 수 있는
시스템을 설계 중임
21
22. 3. Issues: Data Analysis and Mining
A. Data Analysis Issues
• Power Grid의 기본 분석을 위한 알고리즘들에 대한 정리가 필요하다.
e.g.) Voltage, Current, Power 실측 값의 평균 및 변동폭 측정 방법
B. Data Mining Issues
• Power Grid의 Data Mining을 위해 어떤 signal pattern들을 정의하고 detect할
지가 논의되어야 한다.
• 시 계열 (Time-Series) 분석의 효율적인 방법들에 대한 정리 필요
C. Data Analysis Platform Issues
• Power Grid를 포함, general (non-) Time-Series Data Analysis Platform이 되기
위해 전체 시스템을 flexible하게 구성하는 방안 논의
• 분석된 데이터에 대한 시각화 (visualization) 방안 논의
22