Tensorflow service & Machine Learning

나만의 Tensorflow service 개발 환경 만들기
Machine Learning for 정형 데이터
WRITTEN BY jeeHyunPaik

강사 소개
백지현
POSCOICT Manager
intwis100@Gmail.com

나만의 tensotflow service 개발환경 만들기
첫번째, Tensorflow로 Web 서비스 하는 방법
두번째, Restful 을 위한 첫걸음 Django 설정, nginx 설정
세번째, 비동기 처리를 위한 RabbitMQ와 Celery
Machine Learning for 정형 데이터
첫번째, Preprocessing
두번째, Modeling에서의Error bias, Variance
세번째, 새로운 앙상블 알고리즘 boosting
네번째, XGboost, LightGBM
다섯번째, Deep learning
목 차

첫번째, Tensorflow로 Web 서비스 하는 방법

Tensorflow로 서비스 하는 방법
Tensorflow로 뉴로네트워크를 만들었는데
어떻게 다른 시스템과 연동하지?
여러 모델을 각각 훈련시킬수 있을까?
훈련된 모델에게 여러 시스템이
Predict 을 요청할수 있을까?
AI Network를 만들었는데 어떻게 쓰지?

TensorFlow Serving is a flexible, high-performance serving system for machine
learning models, designed for production environments. TensorFlow Serving makes it
easy to deploy new algorithms and experiments, while keeping the same server
architecture and APIs.
Tensorflow Serving은 유연하고, 높은 성능을 보장하는 가동 환경에서 쓸수 있는 머신러닝 서비스 시스템
그러나 Tensorflow 밖에 안됨

Django is a high-level Python Web framework that encourages rapid development and
clean, pragmatic design. Built by experienced developers, it takes care of much of the
hassle of Web development, so you can focus on writing your app without needing to
reinvent the wheel. It’s free and open source.
- 정말 빠르고
- 웹 핸들링을 위한 모든게 준비되어 있으며
- 보안에 강력하며
- 스케일 가능하며
- 엄청나게 다목적인 프레임 워크
https://www.djangoproject.com/

Neural Network를 서비스 하기 위해 기본 조건
- 훈련(Train)
. 훈련할때 시간이 오래 걸리므로 여러 훈련 Task가 비동기적으로 실행가능
- 예측(Predict)
. Train과 상관없이 예측할 수 있어야함
. requests를 동시 처리 가능
여러 OpenSource 사용해서 종합해서 Service를 구축해보자
각각에 대해서는 뒤에서 설명 드리겠습니다.
분산처리도 되면 좋음

nginx -> docker run --name some-nginx -d -p
8080:80 some-content-nginx
posgres -> docker run --name some-app --link
some-postgres:postgres -d
application-that-uses-postgres
version: '3'
services:
web:
build: .
ports:
- "5000:5000"
volumes:
- .:/code
- logvolume01:/var/log
links:
- redis
redis:
image: redis
volumes:
logvolume01: {}
<도커별로 실행하고, 실행 커맨드도 어려움>
<docker-compose는 구조화
실행 커맨드도 있음>
Docker Compose
전환
Docker-compose
container 여럿을 사용하는 도커 애플리케이션을 정의하고 실행하는 도구

Compose is a tool for defining and running multi-container Docker
applications. With Compose, you use a Compose file to configure
your application’s services. Then, using a single command, you
create and start all the services from your configuration.

Docker-compose Image 종류
type name etc
was(Djnago) tf_edu_docker_skp:v1.2 modified
db postgres:latest official
httpd nginx:latest official
job tasker(Celery) tf_edu_docker_skp:v1.2 modified
message broker rabbitmq:latest official

두번째, Restful 을 위한 첫걸음
Django 설정, nginx 설정

Nginx, Django 설정
개발도커안의 Dockerfile 확인
Dockerfile
개발도커 내용
- Tensorflow 1.2.1
- python 3.5
- conda
- nlp libraries(mecab, ngram)
- jupyter
git clone
https://github.com/TensorMSA/skp_edu_docker.git

개발도커안의 폴더와 파일 내용 설명
환경설정 파일(주피터 암호설정)
장고, 셀러리, 주피터 실행파일
Docker-compose 설정 파일
Rest source 파일

django==1.10.5
uwsgi==2.0.14
psycopg2==2.7.2
django_jenkins==0.110.0
django-rest-swagger==2.1.1
djangorestframework==3.5.3
celery==4.0.2
django-celery==3.2.1
nltk==3.2.1
konlpy==0.4.4
gensim==1.0.1
keras==2.0.2
flower==0.9.1
neo4j-driver==1.2.1
opencv-python==3.2.0.7
django-cors-headers==2.0.2
hanja==0.11.0
ngram==3.3.2
gunicorn==19.6.0
requirement.txt
개발도커안의 pip 설치 파일을 확인
- django : was
- psycopg2 : posgres library
- django rest : restful for django
- celery : Task queue
- gunicorn : Web Server gateway
- 각종 Machine learning library
- 각종 NLP Library

vi docker-compose.yml
1. docker-compose 를 생성
docker-compose up -d
docker-compose 를 BackGround로 올림
0. 기존에 설정 되어 있는 파일과 폴더를 백업한다. 바꾼다.
mv code code_bk
mkdir code
mv docker-compose.yml docker-compose.yml_bk
version: '3'
services:
db:
image: postgres:latest
env_file: .env
web:
image: hoyai/tf_edu_docker_skp:v1.2
env_file: .env
volumes:
- ./code:/code
- ./code/static:/static
ports:
- "8888:8888"
- "8989:8989"
- "5555:5555"
- "8000:8000"
depends_on:
- db
** 혹씨 다른 컨테이너가 올라가 있으면 port 충돌로
에러가 발생할수 있음. Docker ps 로 확인후
Container를 Stop 하자
< 실행결과 >

docker-compose exec web django-admin.py startproject tfrest .
2. django project를 만든다.
꼭 docker-compose.yml이 있는 곳에서 실행. project 이름은 tfrest로 설정한다.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'postgres',
'USER': 'postgres',
'HOST': 'db',
'PORT': 5432,
}
}
ALLOWED_HOSTS = ['*']
.
3. DB접속을 위한 ./code/tfrest/setting.py 수정
docker-compose exec web python manage.py makemigrations
docker-compose run web python manage.py migrate
4. Django Postgres migation 수행
docker-compose.yml이 있는 곳에서 실행해야함
이게 실행안되면 2번을 다른 위치에서 실행했을
가능성이 있음 code 디렉토리 지우고 2번부터 다시 실행

docker ps
docker exec -it <container id> bash
python /code/tfrest/manage.py runserver 0.0.0.0:8000
4. Docker 안으로 들어가서 테스트 해보기
http://xxx.xx.xx.xxx:8000으로 확인한다.
Django runserver는 테스트용 서버이므로 ngnix로 교체 예정

docker-compose down
vi docker-compose.yml
5. docker-compose 에 시작 스크립트 넣기
command에 시작 스크립트 추가
web:
env_file: .env
volumes:
- ./code:/code
ports:
- "8888:8888"
- "8989:8989"
- "5555:5555"
- "8000:8000"
command: bash -c "(/run_django.sh &) &&
/run_jupyter.sh"
depends_on:
- db
http://xxx.xx.xx.xxx:8888으로 jupyter를 확인한다.

docker-compose exec web python manage.py startapp api
6. Django Restful 설정
api app을 생성한다(꼭 docker-compose.yml이 있는 곳에서 실행)
api django app이 생성된다. settings.py django rest 설정 추가.
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
]

cd code/api/
cp ../../code_bk/api/tf* .
7. restful Service를 위한 Tensorflow logic 파일 복사
Tensorflow와 rest api 코드를 복사한다
https://github.com/TensorMSA/skp_edu_docker/tree/master/code/api
from .tf_service_exam import TfExamService
from .tf_service_celery import TfServiceCelry
code/api__init__.py 수정
tf_service_exam.py → restful post, get, put, delete
tf_service_celery.py → restful post, get, put, delete
tf_service_logic.py → tensorflow logic
tf_service_celery_logic_task.py → celery tensorflow logic

POST POST를 통해 해당 URI를 요청하면 리소스를 생성합니다.
GET GET를 통해 해당 리소스를 조회합니다. 리소스를 조회하고 해당 도큐먼트에 대한 자세한 정보를 가져온다.
PUT PUT를 통해 해당 리소스를 수정합니다.
DELETE DELETE를 통해 리소스를 삭제합니다.
restful api
- uri는 정보의 자원을 표현해야 한다(동사보다는 명사 사용)
- 자원에 대한 행위는 http method로 표현한다.
Rest
- 웹의 창시자(HTTP)중 한사람인 Roy Fielding의 2000년 논문에 의해서 소개
- 현재의 웹이 본래 설계의 우수성을 많이 활용하지 못하고 판단했기 때문에
웹의 장점을 최대한 활용할수 있는 아키텍처를 설계
- Representational safe transfer(Rest)
- Uniform, Stateless, Cacheable, Self-descriptiveness, Client-Server, 계층형구조 특징

Tensorflow Code 설명
- Tensorflow는 변수와 연산자로 이뤄진 Graph 형태로 표현됨
- Session은 graph를 실행하기 위한 객체
session에 넣고 실행
-
tf.add(a,b) : a 와 b를 더함
tf.reduce_sum(100) : [0,~100] 리스트의 reduce_sum연산
restful 에서 구현할 메소드

vi /tfrest/urls.py
8. url pattern 설정
project root에 위치
from django.conf.urls import url
from django.contrib import admin
from api import TfExamService as tf_service
from django.views.decorators.csrf import csrf_exempt
from api import TfServiceCelry as tf_service_celery
urlpatterns = [
url(r'âdmin/', admin.site.urls),
url(r'âpi/test/type/example1/operator/(?P<operator>.*)/values/(?P<values>.*)/',
csrf_exempt(tf_service.as_view())),
url(r'âpi/test/type/celeryexam1/operator/fib/values/(?P<values>.*)/',
csrf_exempt(tf_service_celery.as_view())),
]

9. nginx와 django 연동
/config/nginx/tfrest.conf
upstream web {
ip_hash;
server web:8000;
}
# portal
server {
location /static/ {
autoindex on;
alias /static/;
}
location / {
proxy_pass http://web/;
}
listen 8000;
server_name localhost;
}
- 포트 : 8000번
- 장고 static 연동
* * Static을 연결 안시키면 나중에 화면이 깨짐
각종 js files를 ngnix에서 access하게 설정해야함
nginx 설정 파일 확인. 이미 git에 반영 시켜 놓았음

10. Django static 설정을 위한 Settings.py 수정
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.10/howto/static-files/
STATIC_ROOT = os.path.join(BASE_DIR , 'static')
STATIC_URL = '/static/'
STATICFILES_DIRS = [
os.path.join(BASE_DIR, "api/static"),
]
11. Collect static 실행
code 디렉토리에서 실행
mkdir -p ./api/static (?)
docker-compose run web python manage.py collectstatic
물어보면 yes

12. Docker-compose 에 nginx(웹서버)를 추가
nginx:
image: nginx:latest
ports:
- "8000:8000"
volumes:
- ./code:/code
- ./config/nginx:/etc/nginx/conf.d
depends_on:
- web
env_file: .env
web
command: bash -c "(gunicorn tfrest.wsgi -b 0.0.0.0:8000 &)&&(flower -A tfrest
&)&&/run_jupyter.sh;"
8000 ports 삭제

13. docker-compose down & up
docker-compose down
http://xxx.xx.xx.xxx:8000:admin으로 접속

Django 설정
14. postgres migration 과 admin 설정
docker-compose exec web python manage.py migrate
docker-compose exec web python manage.py createsuperuser
15. admin password 입력
admin
admin@admin.com
암호입력 2번

Django 설정
16. docker-compose stop, start
admin 으로 접속 잘됨
17. docker-compose down 후 up
admin 으로 접속 에러

두번째, Postgres을 위한
Docker Volumn

Docker volume
Docker Volume
- Docker 데이터 볼륨은 데이터를 컨테이너가 아닌 호스트에 저장하는 방식
- 데이터볼륨은 컨테이너끼리 데이터를 공유하는데 주로 사용
- A data volume is a specially-designated directory within one or more containers that
bypasses the Union File System. Data volumes provide several useful features for
persistent or shared data:
- Volumes are initialized when a container is created. If the container’s parent image
contains data at the specified mount point, that existing data is copied into the new
volume upon volume initialization. (Note that this does not apply when mounting a
host directory.)
- Data volumes can be shared and reused among containers.
- Changes to a data volume are made directly.
- Changes to a data volume will not be included when you update an image.
- Data volumes persist even if the container itself is deleted.

Postgrs & Docker volume 설정
18. docker volume 만들기
docker volume create --name=pg_data
docker volume ls
docker volume 기타 명령어
docker volume inspect <volume_name>
docker volume rm <volume_name>

Postgres & Docker volume 설정
19. Data volume을 Postgres에서 바라 보도록 설정
db:
image: postgres:latest
env_file: .env
volumes:
- pg_data:/var/lib/postgresql/data/
volumes:
pg_data:
external: true
20. Docker-compose restart 후 확인
docker-compose down
docker-compose exec web python manage.py migrate
docker-compose exec web python manage.py createsuperuser
docker-compose down
DB Data
Container가
지워져도 유지가능

21. Rest api를 확인
curl -X POST http://xx.xx.xx.xx:8000/api/test/type/example1/operator/add/values/300/
curl -X GET http://xx.xx.xx.xx:8000/api/test/type/example1/operator/add/values/1,2/
predict는 get method로 처리
train은 post method로 처리

22. Train이 오래 걸리는 경우
curl -X POST http://xx.xx.xx.xx:8000/api/test/type/example1/operator/add/values/100000000/
post가 끝날때 까지 기다려야함…...

세번째, 비동기 처리를 위한
RabbitMQ와 Celery

Celery
Celery
- Celery는 Python으로 작성된 비동기 작업 큐(Asynchronous task queue/job queue)

RabbitMq
- RabbitMQ는 표준 AMQP (Advanced Message Queueing Protocol)메세지
브로커 소프트웨어(message broker software) 오픈소스
- RabbitMQ는 erlang언어로 만들어졌을 뿐만 아니라, clustering과 failover를
위한 OTP framework로 구성
- RabbitMQ는 다양한 언어로된 RabbitMQ client를 지원
- 대용량 데이터를 처리하기 위한 배치 작업이나, 채팅 서비스, 비동기 데이터를
처리할때 사용
https://docs.google.com/presentation/d/
185sirdtEzVm59oGAivd7vILeffn6ZeXA22
b6PXS1vM4/edit#slide=id.i61
<Celery와 rabbitmq Architecture>

Celery Rabbit MQ Flow
https://docs.google.com/presentation/d/185sirdtEzVm59oGAivd7
vILeffn6ZeXA22b6PXS1vM4/edit#slide=id.i61

Celery 설정
23. 개발도커에는 celery 관련 library가 설치 되어 있음
djangorestframework==3.5.3
celery==4.0.2
django-celery==3.2.1
requirement.txt
24. rabbitmq를 docker-composer에 올리기
rabbit:
hostname: rabbit
image: rabbitmq:latest
environment:
- RABBITMQ_DEFAULT_USER=admin
- RABBITMQ_DEFAULT_PASS=mypass
depends_on:
- web
ports:
- "5672:5672" # we forward this port because it's useful for debugging
- "15672:15672" # here, we can access rabbitmq management plugin
docker-compose.yml
docker-composer down

Celery 설정
25. RabbitMq 가동
docker-composer up -d & docker-compose ps

Celery 설정
26. django에서 rabbit을 바라 볼수 있도록 settings.py 수정
vi tfrest/settings.py
BROKER_URL = 'amqp://admin:mypass@rabbit//'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_RESULT_BACKEND = 'db+sqlite:///results.sqlite'
CELERY_TASK_SERIALIZER = 'json'
CELERY_HIJACK_ROOT_LOGGER = False
27. celery 설정 파일을 만듬
vi tfrest/celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
import logging
from django.conf import settings
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'tfrest.settings')
app = Celery('tfrest')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
CELERYD_HIJACK_ROOT_LOGGER = False

Celery 설정
28. django Project에서 Celery 사용할수 있게 __init__.py 변경
vi tfrest/__init__.py
from __future__ import absolute_import, unicode_literals
from .celery import app as celery_app
__all__ = ['celery_app']

Celery 설정
29. celery shared job을 설정(@shared_task)
from api.tf_service_celery_logic_task import train
import logging
class TfServiceCelry(APIView):
def post(self, request,values):
try :
logging.info("celery test start")
result = train.delay(int(values))
result_data = {"status": "200", "result": str(result)}
return Response(json.dumps(result_data))
except Exception as e:
logging.error(str(e))
raise
rom __future__ import absolute_import, unicode_literals
from celery import shared_task
from .tf_service_logic import TfExamBackendService as tebs
@shared_task
def train(num):
print("train delay started")
try:
tf_class = tebs()
tf_result = tf_class.tf_logic_train_reduce_sum(num)
except Exception as e:
print(str(e))
tf_result = str(e)
return tf_result
- celery task로 실행할 method에 @shared_task를 붙임
- rest에서 호출할때는 delay로 호출

Celery 설정
30. celery demon startup script
#!/bin/bash
celery -A tfrest worker -l info
run_celery.sh
- celery 서버는 ngnix 대신 celery를 실행해야 함
31. Docker-compose에 Celery 추가
celery:
volumes:
- ./code:/code
command: bash -c "/run_celery.sh;"
depends_on:
- rabbit
run_celery.sh

Celery 설정
32. docker-compose down & up

Celery 설정
33. docker network 확인
docker network ls
docker network inspect <container id>

Celery 설정
33. Celery로 task 실행
curl -X POST http://xx.xx.xx.xx:8000/api/test/type/celeryexam1/operator/fib/values/100000000/
url(r'^api/test/type/celeryexam1/operator/fib/values/(?P<values>.*)/',
csrf_exempt(tf_service_celery.as_view())),
uri패턴을 모르면 urls.py 참조해서 호출
34. Celery 결과 확인

Celery 설정
35. Celery 작업 결과를 확인할수 있는 celery flower 설치
celery==4.0.2
flower==0.9.1
requirement.txt
- 이미 개발 도커에 설치 되어 있음
36. Django web에서 flower 같이 실행되게 Docker-compose 설정
web:
command: bash -c "(gunicorn tfrest.wsgi -b 0.0.0.0:8000 &)&&(flower -A tfrest &)&&/run_jupyter.sh;"
docker-compose.yml

Celery 설정
37. celery flower 접속
- http://xx.xx.xx.xx:5555
- post 호출

Celery 설정
38. 여러개의 task가 순차적으로 실행되는것을 확인
작업결과 확인가능

Celery 설정
39. Docker-compose는 scale up 가능
docker-compose scale celery=5
scale 명령으로 celery를 순간 5개를 실행함

Celery 설정
- flower에서도 worker 가 순간 5개로 할당됨
- post 마구 실행

나만의 Tensorflow service 개발 환경
만들기 끝

Machine Learning for
정형 데이터

AI, Machine learning, Deep Learning 범위

AI, Machine learning, Pattern Regonition, Deep Learning
상황과 데이터에 맞는 알고리즘을 적용해야함.
Machine Learning 혹은 Deep Learning을 쓰기 위해
알아야할 기본적인 이론 설명

정형 데이터 분석 PIPELINE
Machine Learning을 하기 위해서는 데이터를 변환해야 함

Preprocessing
Data Cleasing(Null 값 처리 방법)
- Delete(삭제)
. 가장 간단한 방법
. 누락된 데이터가 10%이상이 될 경우 고려가 필요함
. 누락된 데이터 자체가 의미가 있을 경우 고려가 필요함
예 : 설문지에서 나이가 필수가 아닌경우 누락된 나이는 개인정보를 민감하게 반응하는 사람을 대표 할수도 있음
- Filling(채우기)
. data를 채우는 방법
. 범주형(Categorical) 데이터는 Unknown 혹은 특정 값으로 변환
. 연속형 데이터는 0 혹은 평균 등을 사용가능

Preprocessing
Feature Engineering
. Category Data 변환
. Continuous Data를 범주형으로 변환
. Feature를 더해서 새로운 Feature를 만듬
. Feature의 중요도를 찾아서 필요 없는 Feature 제거
. 연속형을 범주형 데이터로 변환
Feature Engneering을 하기 위한 필수적인 라이브러리를 몇 가지 소개합니다.
데이터를 보고 간단한 연산을 수행: pandas
각종 머신러닝 모델: scikit-learn
그라디언트 부스팅 라이브러리: xgboost
데이터 시각화: matplotlib

Preprocessing
Pandas 기본, Titanic 데이터 전처리 과정 설명
(jupyter)

두번째, Modeling에서의 Error
bias, Variance

Modeling
좋은 모델을 만들기 근본적으로 알아야할 문제
Error를 어떻게 줄여야 하는가?
에러의 측정은
MSE(Mean Square error) 평균제곱 오차

Modeling Error
- Error의 종류
- Training Error
- 훈련 에러
- Train Error는 0으로 만들수 있음(overfitting)
- Training Error보다 Test Error가 중요
- Test Error(Prediction Error)
- Error를 줄이기 위해서는 bias와 variance를 줄여야 함
- 그러나 bias와 variance 는 trade off 관계
- 한개를 줄이면 나머지 한개는 커짐

http://blog.naver.com/sw4r/221010524137
- F는 실제 함수
- 존재 하지만 보이지는 않음
- G는 추정한 함수
- 측정한 Data가 random하기 때문에 G도 Random
- 여러개의 Random 함수가 그려짐
실제함수와 추청한 함수의 Error 표시
실제함수와 추청한 함수의 Error 표시
bias 와 Variance
G(예측 함수), F(이상적인 진짜 함수)의 차이 Error

bias 와 variance
- 실제함수와 추정한 함수의 Error 표시
G는 특정한 샘플을 로 만든 추정 함수
F(x) 는 존재 하는것들
기대값의 정의
- 실제 함수와 추정한 함수의 Error의 기대값을 구하면
- G(램덤 함수)의 모든 샘플(D : Random밸류)에 대한 Expected Value 표현
모든 Error의 기대값(Expected Value)
Expected Prediction Error
* 모든 X에 대한 함수G에 대한 Expected Value는 Eg로 표현

bias 와 variance
Prediction Error를 decompose시키면
bias의 제곱과 variance로 나눌수 있음
(a+b)^ = a^ + b^ +2ab
E는 모든 가능한 분포를
다보는것

bias 와 variance
- 같은 데이터 상황에서(Train Data 100개를 가정하면)
- Bias와 Variance의 Trade off 관계
- Bias와 Variance가 동시에 줄어들수 없음
- 샘플이 많아지면 Bias와 Variance가 동시에 줄어들수 있음
- bias와 Variance를 두개다 줄이는 방향으로 훈련해야함
E
iter50 70 7050
under fitting over fitting
샘플 : 500개
샘플 : 100개
샘플 : 50개
iter
E

- bias가 높다
- 영구적으로 에러가 계속해서
나는것
- Error가 있지만 Stable하다
Under Fitting Over Fitting
- Variance 가 높다
- 어떨때는 잘 맞지만 어떨때는
안맞는다.
- 대체로 Error가 작지만 언제 틀릴지
모른다.
bias 와 variance

- 같은 샘플에서 variance를 줄이는 방법
- bagging
- (샘플수를 늘린다.) -> 현실상 불가능
- Regularization(모델의 수식이 복잡해지면 패널티)
- 기타 등등
베리언스다 엄청 크다.
모델이 복잡하다.
bias가 크다
모델이 안정적이다.
* bias는 지속적으로 Error가 나는 것이니 훈련을 더 시키면 bias가 줄어듬
bias 와 variance

세번째, variance를 줄이기 위한 방법
Bagging, Boosting

Bagging
- bagging
- (샘플수를 늘린다.)
- Regularization
- 기타 등등
Bagging
1) Train sample 이 있음
2) Random selection 을 함
3) 첫번째 subset 을 만듬
4) 다시 Train sample에 넣음
5) 다시 random select 을 함
6) 두번째 subset 을 만듬
7) n개까지 반복함
8) sub set 별로 regressor 나 classfier를 붙임
9) 결정은 regression은 mean
classification은 majority로 선택
- 대표적인 예 : random forest
장점은??
같은 샘플에서 variance를 줄이는 방법은?

장점은 Overfitting을 회피 할 수 있음
Bagging(Bootstrap Aggreigation)의 장점은????
Bagging

- Bagging을 사용한 Random forest
https://www.stat.berkeley.edu/~breiman/randomforest2001.pd
Definition : A random forest is a classifier consisting of a collection
of treestructured classifiers {h(x,Θk ), k=1, ...} where the {Θk} are
independent identically distributed random vectors and each tree
casts a unit vote for the most popular class at input x .
- Ramdom forest는 classifier
- 여러개의 Tree를 가지는 집합
- 세타 K는 독립적인 랜덤 백터(이상적인 상황)
- 결정은 majority로 한다.
실제로는 완전히 독립적인 tree를 만들수가 없음
bagging에서 중복으로 샘플이 채취될수 밖에 없음
Bagging

Random forest로 variance를 줄이는 방법
- bagging을 많이 한다.
- random sampling을 사용
- feature selection 을 random 하여 함(correlation을 줄이기 위해)
Bagging
Bagging 으로 인해 variance를 낮췄지만 못 맞추는
것을 여전히 못맞춤
못 맞추는 것에 더 Weight를 주는 Boosting이 나옴

Bagging
Random forest 예제
- 실습 문제

세번째, 새로운 앙상블 알고리즘
boosting

Boostring(Ada boost)
Boosting
- 어려운 문제를 더 집중해서 잘 맞출수 있게 훈련함
- Bagging 은 병렬로 학습 하지만 Boosting은 순차적으로 학습
- 단순하고 약한 학습기(Weak Learner)를 결합해서 보다 정확하고
강력한 학습기(Strong Learner)를 만드는 방식

Ada Boosting 알고리즘
Initialize weights
( Assign each training point
equal weight)
Caculate error rates
for each h
Pick best h
with the smallest error rate
Calculate voting
power for h
Finished
H is good enough?
enough round?
no good classfier left
(best h has ½ error rate)
Update weights
to emphaseize points that were misclassfierNO
NO
* 틀리면 가중치를 더줌

- Ada Boosting Example
A
+
B
+
D
+
E
+
C
-
1 3 5 x
y
Round1 Round2 Round3
Wa 1/5 (right)1/8 (right)1/12
Wb 1/5 (right)1/8 (wrong)3/12
Wc 1/5 (wrong)4/8 (right)4/12
Wd 1/5 (right)1/8 (right)1/12
H(X)
(classfier)
잘못 We 1/5 (right)1/8 (Wrong)3/12
X<2 B E 2/5 2/8 6/12
x<4 BCE 3/5 6/8 10/12
x<6 C 1/5 1/2 4/12
x>2 A CD 3/5 6/8 6/12
x>4 A D 2/5 2/8 2/12
x>6 AB DE 4/5 4/8 8/12
h(x) x<6 X<2 x>4
1/5 1/4 1/5
voting
power
½ ln4 ½ ln3 ½ ln5
부등호를 만족하면 +
동점은 앞번호 우선
맞으면 Weight가 떨어짐
틀리면 Weit가 높아짐
H(x) = SIGN(½ ln4 (X<6) + 1/2ln3 (x<2) + ½ ln 5 (x>4)
Decision Tree를 사용하도록 가정함
1 2
3
trainpoint 5 = 1/5
4
error rate
5
6
7
8
first voting
power
9 끝내도 되는가?
10
13
12
11
19
16
18
17
14
2nd voting
power
Round3
X<2인경우
B와 E가
틀림

1
2
3
4
Bagging 과 Boosting 비교

Xgboost
Xgboost
- Kaggle에서 많이 쓰이고 있음
- gradient boosted decision trees를 속도와 성능면에서 향상 시켰음
- 다양한 문제에 적용가능
. Binary classification
. Nulticalss calssification
. Regesstion
. Learning to Rank
https://github.com/dmlc/xgboost/tree/master/demo#machi
ne-learning-challenge-winning-solutions

LightGBM
LightGBM
- A fast, distributed, high performance gradient boosting (GBDT,
GBRT, GBM or MART) framework based on decision tree
algorithms, used for ranking, classification and many other
machine learning tasks. It is under the umbrella of the
DMTK(http://github.com/microsoft/dmtk) project of Microsoft.
보통은 level wise
Lightgbm은 leaf-wise
http://github.com/microsoft/dmtk

참고 # Kfold validation
KFold(Cross-Validation)
- 훈련 데이터를 K 개로 나누고 그 중에
한 묶음을 검증 데이터로 사용하고
나머지 묶음을 훈련 데이터로 사용
- 그 다음은 검증 데이터로 사용할 묶음을 바꾼
다음 다시 똑같이 학습
- 총 K 번 시도하게 되고 모델의 에러 값은
K 로 평균내어 사용
- K-폴드 크로스 밸리데이션
(K-fold Cross-Validation)

참고 # Stacking
Boosting을 개선 할수 있는 방법은 없을까?
- 왜 Weak learner만 학습 시키지?
- Strong learner를 더 발전 시킬 수 없을까?
- Strong learner를 모아 Second-Level 학습을 시키자.(Meta learning)
- Base learner의 최적 조합을 찾게 도와줌
많은 Stacking Model이 있음

XGBoost 예제
XGBoost and Stacking 예제

왜 DeepLearing 일까?
Deep Learning
- 특징 추출과, 패턴분류의 두단계로 진행되된 기존의 과정을
하나의 단계로 통합해서 해결
- 영상데이터 같은 차원수가 아주 크고 복잡한 데이터에 대해 전처리
과정을 통해 제거 되었을수도 있는 Feature를 자동으로 추출
- 물체인식, 음성인식 등에서 전통적인 머신러닝 기법을 이김
- Unlabled Data를 학습 하는 여러 기법이 발견됨
-

Deep Learning 예
Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs.
Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization
requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through
low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize
and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning—jointly trained wide
linear models and deep neural networks—to combine the benefits of memorization and generalization for recommender systems. We
productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online
experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our
implementation in TensorFlow.
- linear model : Memorization relevance(암기)
- Deep : Generalization diversity(일반화)
Wide And Deep Model

무엇이 암기 이고, 무엇이 일반화 인가?
tensorflow summit wide and deep video
Tensorflow summit 에서 Wide and Deep 강의 중
사람은 어떻게 학습 하는가??
비둘기는 날수 있다.
매는 날수 있다.
사람은 일상을 관찰하면서 지식을 학습하고
상관없는 두 특성을 잘 암기함

그러나 아무리 기억력이 좋아도, 모든 동물을
볼수도 없고 다 기억할수도 없음

일반화는 내가 경험했던 것들을 아주 간단한 특성으로 요약 하는것
곧, 일반화를 통해 보지 못하거나 알지 못하는 것도 알 수 있다.
날개가 있는동물은 날수 있다
학습의 가능 중요한 단계 일반화(Generalization)

날개가 있는데
날지 못하네
그러나 일반화(Generalization) 문제가 있음 모든 상황에 적용이 안됨

암기와 일반화를 합치면 어떻게 될까?

Wide And Deep 활용
Google PlayStore Recommend System에 적용하여 성능 향상을 보임
약 4%의 성능 향상을 보임

Wide and deep Source 분석
# Wide model feature
education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000)
relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100)
# deep model feature embedding
tf.contrib.layers.embedding_column(workclass, dimension=8),
tf.contrib.layers.embedding_column(education, dimension=8),
# model structure
tf.contrib.learn.DNNLinearCombinedClassifier(... dnn_hidden_units=[100, 50] … )
3개의 api로 wide and deep model을 구현 가능

Wide and deep Source 예제
Wide and Deep Learning 예제
실습 문제

Tensorflow service & Machine Learning

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Tensorflow service & Machine Learning

Semelhante a Tensorflow service & Machine Learning (20)

Tensorflow service & Machine Learning