SlideShare uma empresa Scribd logo
1 de 13
ckan 2.0: 
Harvesting from other sources 
Internship @ Academia Sinica 
Report #3 
Presenter: Cheng-Jen Lee (Sol) 
Email: cjlee AT iis.sinica.edu.tw 
This work is licensed under a 
Creative Commons Attribution-ShareAlike 3.0 Taiwan License.
Agenda 
● Harvesters 
– Usage: manually and automatically 
– Custom harvester 
– some issues 
● Linked Data and RDF 
Oct 13, 2014 2
Harvesters 
● ckanext-harvest 
– Remote harvesting extension 
● Source Type 
– CSW 
– csv/xls 
– WAF 
– custom 
Oct 13, 2014 3
Harvesters 
Oct 13, 2014 4
Harvesters 
● Usage (manually) 
– (pyenv) $ paster --plugin=ckanext-harvest 
harvester gather_consumer -c 
/etc/ckan/default/production.ini 
– (pyenv) $ paster --plugin=ckanext-harvest 
harvester fetch_consumer -c 
/etc/ckan/default/production.ini 
– (pyenv) $ paster --plugin=ckanext-harvest 
harvester run -c 
/etc/ckan/default/production.ini 
Oct 13, 2014 5
Harvesters 
● Usage (automatically) 
– Supervisor (for gather & fetch consumer) 
– Cron (for run) 
● Supervisor (with profile) 
– $ sudo supervisorctl reread 
– $ sudo supervisorctl add ckan_gather_consumer 
– $ sudo supervisorctl add ckan_fetch_consumer 
– $ sudo supervisorctl start ckan_gather_consumer 
– $ sudo supervisorctl start ckan_fetch_consumer 
Oct 13, 2014 6
Harvesters 
● Custom harvester 
– We can implement the harvester interface to 
perform harvesting operations 
– The process take place on three steps: 
● gather: get the identification 
● fetch: fetch the contents 
● import: create ckan package(dataset) 
– Implementation 
● https://github.com/u10313335/ckanext-harvest/ 
blob/master/ckanext/harvest/harvesters/srda 
harvester.py 
Oct 13, 2014 7
Harvesters 
● Harvesting Interface 
from ckan.plugins.core import SingletonPlugin, implements 
from ckanext.harvest.interfaces import IHarvester 
class MyHarvester(SingletonPlugin): 
implements(IHarvester) 
def get_original_url(self, harvest_object_id): 
:param harvest_object_id: HarvestObject id 
:returns: A string with the URL to the original document 
def gather_stage(self, harvest_job): 
:param harvest_job: HarvestJob object 
:returns: A list of HarvestObject ids 
def fetch_stage(self, harvest_object): 
:param harvest_object: HarvestObject object 
:returns: True if everything went right, False if errors were found 
def import_stage(self, harvest_object): 
Oct 13, 2014 8 
:param harvest_object: HarvestObject object 
:returns: True if everything went right, False if errors were found
Harvesters 
● Some issues 
– Title with non-ASCII characters 
– Useless update check 
– TGOS CSW: failed in gather stage 
● Caused by OWSLib 
– Harvest source varies 
● We should modified the extension for properly 
harvesting 
● Modified version available 
– On Github: 
https://github.com/u10313335/ckanext-harvest 
Oct 13, 2014 9
Linked Data and RDF 
● Resource Description Framework 
– a family of W3C specifications 
– a metadata data model 
– based on XML, URI 
Oct 13, 2014 10 
Source: http://techserviceslibrary.blogspot.tw/2011/04/rdf-resource-description.html
Linked Data and RDF 
● Vocabularies 
– DCAT and Dublin Core 
● Two way to get RDF metadata 
– curl -L -H "Accept:application/rdf+xml" 
http://thedatahub.org/dataset/gold-prices 
– curl -L http://thedatahub.org/dataset/gold-prices. 
rdf 
Oct 13, 2014 11
Documents 
● Read the Docs: 
– https://readthedocs.org/projects/ckan-docs-tw/ 
Oct 13, 2014 12
Thanks for your attention! 
Any Q? 
Oct 13, 2014 13

Mais conteúdo relacionado

Mais procurados

Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Vietnam Open Infrastructure User Group
 
A la découverte de kubernetes
A la découverte de kubernetesA la découverte de kubernetes
A la découverte de kubernetesJulien Maitrehenry
 
MS 빅데이터 서비스 및 게임사 PoC 사례 소개
MS 빅데이터 서비스 및 게임사 PoC 사례 소개MS 빅데이터 서비스 및 게임사 PoC 사례 소개
MS 빅데이터 서비스 및 게임사 PoC 사례 소개I Goo Lee
 
오픈소스GIS를 활용한 서버기반 공간분석과 시각화
오픈소스GIS를 활용한 서버기반 공간분석과 시각화오픈소스GIS를 활용한 서버기반 공간분석과 시각화
오픈소스GIS를 활용한 서버기반 공간분석과 시각화MinPa Lee
 
Understanding LLM LLMOps & MLOps_open version.pdf
Understanding LLM LLMOps & MLOps_open version.pdfUnderstanding LLM LLMOps & MLOps_open version.pdf
Understanding LLM LLMOps & MLOps_open version.pdfChun Myung Kyu
 
Local Apache NiFi Processor Debug
Local Apache NiFi Processor DebugLocal Apache NiFi Processor Debug
Local Apache NiFi Processor DebugDeon Huang
 
공간데이터베이스(Spatial db)
공간데이터베이스(Spatial db)공간데이터베이스(Spatial db)
공간데이터베이스(Spatial db)H.J. SIM
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-RegionJi-Woong Choi
 
Alphorm.com-Formation MongoDB Administration
Alphorm.com-Formation MongoDB AdministrationAlphorm.com-Formation MongoDB Administration
Alphorm.com-Formation MongoDB AdministrationAlphorm
 
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축Ji-Woong Choi
 
"Relax and Recover", an Open Source mksysb for Linux on Power
"Relax and Recover", an Open Source mksysb for Linux on Power"Relax and Recover", an Open Source mksysb for Linux on Power
"Relax and Recover", an Open Source mksysb for Linux on PowerSebastien Chabrolles
 
Elastic Stack & Data pipeline
Elastic Stack & Data pipelineElastic Stack & Data pipeline
Elastic Stack & Data pipelineJongho Woo
 
Airflow를 이용한 데이터 Workflow 관리
Airflow를 이용한  데이터 Workflow 관리Airflow를 이용한  데이터 Workflow 관리
Airflow를 이용한 데이터 Workflow 관리YoungHeon (Roy) Kim
 
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Vietnam Open Infrastructure User Group
 
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)Cloudera Japan
 
공간정보아카데미 - Day1 오픈소스개발 일반
공간정보아카데미 - Day1 오픈소스개발 일반공간정보아카데미 - Day1 오픈소스개발 일반
공간정보아카데미 - Day1 오픈소스개발 일반BJ Jang
 
Alluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-CloudAlluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-CloudJinwook Chung
 
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링OpenStack Korea Community
 

Mais procurados (20)

Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
 
A la découverte de kubernetes
A la découverte de kubernetesA la découverte de kubernetes
A la découverte de kubernetes
 
MS 빅데이터 서비스 및 게임사 PoC 사례 소개
MS 빅데이터 서비스 및 게임사 PoC 사례 소개MS 빅데이터 서비스 및 게임사 PoC 사례 소개
MS 빅데이터 서비스 및 게임사 PoC 사례 소개
 
오픈소스GIS를 활용한 서버기반 공간분석과 시각화
오픈소스GIS를 활용한 서버기반 공간분석과 시각화오픈소스GIS를 활용한 서버기반 공간분석과 시각화
오픈소스GIS를 활용한 서버기반 공간분석과 시각화
 
Understanding LLM LLMOps & MLOps_open version.pdf
Understanding LLM LLMOps & MLOps_open version.pdfUnderstanding LLM LLMOps & MLOps_open version.pdf
Understanding LLM LLMOps & MLOps_open version.pdf
 
Local Apache NiFi Processor Debug
Local Apache NiFi Processor DebugLocal Apache NiFi Processor Debug
Local Apache NiFi Processor Debug
 
공간데이터베이스(Spatial db)
공간데이터베이스(Spatial db)공간데이터베이스(Spatial db)
공간데이터베이스(Spatial db)
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
 
6 QGIS layout
6 QGIS layout6 QGIS layout
6 QGIS layout
 
Alphorm.com-Formation MongoDB Administration
Alphorm.com-Formation MongoDB AdministrationAlphorm.com-Formation MongoDB Administration
Alphorm.com-Formation MongoDB Administration
 
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
 
"Relax and Recover", an Open Source mksysb for Linux on Power
"Relax and Recover", an Open Source mksysb for Linux on Power"Relax and Recover", an Open Source mksysb for Linux on Power
"Relax and Recover", an Open Source mksysb for Linux on Power
 
Elastic Stack & Data pipeline
Elastic Stack & Data pipelineElastic Stack & Data pipeline
Elastic Stack & Data pipeline
 
Kubernetes
Kubernetes Kubernetes
Kubernetes
 
Airflow를 이용한 데이터 Workflow 관리
Airflow를 이용한  데이터 Workflow 관리Airflow를 이용한  데이터 Workflow 관리
Airflow를 이용한 데이터 Workflow 관리
 
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
 
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
 
공간정보아카데미 - Day1 오픈소스개발 일반
공간정보아카데미 - Day1 오픈소스개발 일반공간정보아카데미 - Day1 오픈소스개발 일반
공간정보아카데미 - Day1 오픈소스개발 일반
 
Alluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-CloudAlluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-Cloud
 
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
[OpenInfra Days Korea 2018] (Track 4) - Grafana를 이용한 OpenStack 클라우드 성능 모니터링
 

Destaque

Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Cataloguesdavid-read
 
DCAT-AP exchanging metadata
DCAT-AP exchanging metadataDCAT-AP exchanging metadata
DCAT-AP exchanging metadataBart Hanssens
 
CKANCon 2016 & IODC16
CKANCon 2016 & IODC16CKANCon 2016 & IODC16
CKANCon 2016 & IODC16Chengjen Lee
 
Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Chengjen Lee
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop ToolsXplenty
 
Final version sql over hadoop ver1
Final version sql over hadoop ver1Final version sql over hadoop ver1
Final version sql over hadoop ver1Sudheesh Narayanan
 
Customizing CKAN
Customizing CKANCustomizing CKAN
Customizing CKANOKCon2013
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name NodeAaron Cordova
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?DataWorks Summit
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Quick Linked Data Introduction
Quick Linked Data IntroductionQuick Linked Data Introduction
Quick Linked Data IntroductionMichael Hausenblas
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRDouglas Bernardini
 
30 Minute Guide to RDF and Linked Data
30 Minute Guide to RDF and Linked Data30 Minute Guide to RDF and Linked Data
30 Minute Guide to RDF and Linked DataIan Davis
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 

Destaque (20)

Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
 
DCAT-AP exchanging metadata
DCAT-AP exchanging metadataDCAT-AP exchanging metadata
DCAT-AP exchanging metadata
 
CKANCon 2016 & IODC16
CKANCon 2016 & IODC16CKANCon 2016 & IODC16
CKANCon 2016 & IODC16
 
Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
 
Final version sql over hadoop ver1
Final version sql over hadoop ver1Final version sql over hadoop ver1
Final version sql over hadoop ver1
 
Customizing CKAN
Customizing CKANCustomizing CKAN
Customizing CKAN
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name Node
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
DCAT: a tale of exchanging metadata
DCAT: a tale of exchanging metadataDCAT: a tale of exchanging metadata
DCAT: a tale of exchanging metadata
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Quick Linked Data Introduction
Quick Linked Data IntroductionQuick Linked Data Introduction
Quick Linked Data Introduction
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
 
30 Minute Guide to RDF and Linked Data
30 Minute Guide to RDF and Linked Data30 Minute Guide to RDF and Linked Data
30 Minute Guide to RDF and Linked Data
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 

Semelhante a ckan 2.0: Harvesting from other sources

Symfony Under the Hood
Symfony Under the HoodSymfony Under the Hood
Symfony Under the HoodeZ Systems
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and FuturePivotalOpenSourceHub
 
Angular Intermediate
Angular IntermediateAngular Intermediate
Angular IntermediateLinkMe Srl
 
Django deployment best practices
Django deployment best practicesDjango deployment best practices
Django deployment best practicesErik LaBianca
 
API Testing Presentations.pptx
API Testing Presentations.pptxAPI Testing Presentations.pptx
API Testing Presentations.pptxManmitSalunke
 
Using JHipster for generating Angular/Spring Boot apps
Using JHipster for generating Angular/Spring Boot appsUsing JHipster for generating Angular/Spring Boot apps
Using JHipster for generating Angular/Spring Boot appsYakov Fain
 
Symfony tips and tricks
Symfony tips and tricksSymfony tips and tricks
Symfony tips and tricksJavier Eguiluz
 
Magento Attributes - Fresh View
Magento Attributes - Fresh ViewMagento Attributes - Fresh View
Magento Attributes - Fresh ViewAlex Gotgelf
 
Symfony internals [english]
Symfony internals [english]Symfony internals [english]
Symfony internals [english]Raul Fraile
 
$.get, a Prime on Data Fetching
$.get, a Prime on Data Fetching$.get, a Prime on Data Fetching
$.get, a Prime on Data FetchingJosh Black
 
A Series of Fortunate Events: Building an Operator in Java
A Series of Fortunate Events: Building an Operator in JavaA Series of Fortunate Events: Building an Operator in Java
A Series of Fortunate Events: Building an Operator in JavaVMware Tanzu
 
Deploying Symfony | symfony.cat
Deploying Symfony | symfony.catDeploying Symfony | symfony.cat
Deploying Symfony | symfony.catPablo Godel
 
Puppet Performance Profiling
Puppet Performance ProfilingPuppet Performance Profiling
Puppet Performance Profilingripienaar
 
Magento Live Australia 2016: Request Flow
Magento Live Australia 2016: Request FlowMagento Live Australia 2016: Request Flow
Magento Live Australia 2016: Request FlowVrann Tulika
 
Creating REST Applications with the Slim Micro-Framework by Vikram Vaswani
Creating REST Applications with the Slim Micro-Framework by Vikram VaswaniCreating REST Applications with the Slim Micro-Framework by Vikram Vaswani
Creating REST Applications with the Slim Micro-Framework by Vikram Vaswanivvaswani
 
Test in action week 3
Test in action   week 3Test in action   week 3
Test in action week 3Yi-Huan Chan
 
Gearman - Job Queue
Gearman - Job QueueGearman - Job Queue
Gearman - Job QueueDiego Lewin
 
Android app performance
Android app performanceAndroid app performance
Android app performanceSaksham Keshri
 

Semelhante a ckan 2.0: Harvesting from other sources (20)

Symfony Under the Hood
Symfony Under the HoodSymfony Under the Hood
Symfony Under the Hood
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future
 
Angular Intermediate
Angular IntermediateAngular Intermediate
Angular Intermediate
 
Django deployment best practices
Django deployment best practicesDjango deployment best practices
Django deployment best practices
 
API Testing Presentations.pptx
API Testing Presentations.pptxAPI Testing Presentations.pptx
API Testing Presentations.pptx
 
Using JHipster for generating Angular/Spring Boot apps
Using JHipster for generating Angular/Spring Boot appsUsing JHipster for generating Angular/Spring Boot apps
Using JHipster for generating Angular/Spring Boot apps
 
Symfony tips and tricks
Symfony tips and tricksSymfony tips and tricks
Symfony tips and tricks
 
Magento Attributes - Fresh View
Magento Attributes - Fresh ViewMagento Attributes - Fresh View
Magento Attributes - Fresh View
 
Symfony internals [english]
Symfony internals [english]Symfony internals [english]
Symfony internals [english]
 
$.get, a Prime on Data Fetching
$.get, a Prime on Data Fetching$.get, a Prime on Data Fetching
$.get, a Prime on Data Fetching
 
A Series of Fortunate Events: Building an Operator in Java
A Series of Fortunate Events: Building an Operator in JavaA Series of Fortunate Events: Building an Operator in Java
A Series of Fortunate Events: Building an Operator in Java
 
Deploying Symfony | symfony.cat
Deploying Symfony | symfony.catDeploying Symfony | symfony.cat
Deploying Symfony | symfony.cat
 
Puppet Performance Profiling
Puppet Performance ProfilingPuppet Performance Profiling
Puppet Performance Profiling
 
Intro to PSGI and Plack
Intro to PSGI and PlackIntro to PSGI and Plack
Intro to PSGI and Plack
 
Magento Live Australia 2016: Request Flow
Magento Live Australia 2016: Request FlowMagento Live Australia 2016: Request Flow
Magento Live Australia 2016: Request Flow
 
Creating REST Applications with the Slim Micro-Framework by Vikram Vaswani
Creating REST Applications with the Slim Micro-Framework by Vikram VaswaniCreating REST Applications with the Slim Micro-Framework by Vikram Vaswani
Creating REST Applications with the Slim Micro-Framework by Vikram Vaswani
 
Using Maven2
Using Maven2Using Maven2
Using Maven2
 
Test in action week 3
Test in action   week 3Test in action   week 3
Test in action week 3
 
Gearman - Job Queue
Gearman - Job QueueGearman - Job Queue
Gearman - Job Queue
 
Android app performance
Android app performanceAndroid app performance
Android app performance
 

Mais de Chengjen Lee

Preserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary EventsPreserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary EventsChengjen Lee
 
Retooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.ioRetooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.ioChengjen Lee
 
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹Chengjen Lee
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKANChengjen Lee
 
CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)Chengjen Lee
 
CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)Chengjen Lee
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享Chengjen Lee
 
CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例Chengjen Lee
 
ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)Chengjen Lee
 
ckan 2.0 Introduction (20140522 updated)
ckan 2.0 Introduction  (20140522 updated)ckan 2.0 Introduction  (20140522 updated)
ckan 2.0 Introduction (20140522 updated)Chengjen Lee
 
Introduction to Pelican
Introduction to PelicanIntroduction to Pelican
Introduction to PelicanChengjen Lee
 
ckan 2.0: a deeper look
ckan 2.0: a deeper lookckan 2.0: a deeper look
ckan 2.0: a deeper lookChengjen Lee
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 IntroductionChengjen Lee
 

Mais de Chengjen Lee (15)

Preserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary EventsPreserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary Events
 
Retooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.ioRetooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.io
 
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN
 
CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)
 
CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
 
CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例
 
ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)
 
ckan 2.0 Introduction (20140522 updated)
ckan 2.0 Introduction  (20140522 updated)ckan 2.0 Introduction  (20140522 updated)
ckan 2.0 Introduction (20140522 updated)
 
Report 140227
Report 140227Report 140227
Report 140227
 
Report 140213
Report 140213Report 140213
Report 140213
 
Introduction to Pelican
Introduction to PelicanIntroduction to Pelican
Introduction to Pelican
 
ckan 2.0: a deeper look
ckan 2.0: a deeper lookckan 2.0: a deeper look
ckan 2.0: a deeper look
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 Introduction
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

ckan 2.0: Harvesting from other sources

  • 1. ckan 2.0: Harvesting from other sources Internship @ Academia Sinica Report #3 Presenter: Cheng-Jen Lee (Sol) Email: cjlee AT iis.sinica.edu.tw This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Taiwan License.
  • 2. Agenda ● Harvesters – Usage: manually and automatically – Custom harvester – some issues ● Linked Data and RDF Oct 13, 2014 2
  • 3. Harvesters ● ckanext-harvest – Remote harvesting extension ● Source Type – CSW – csv/xls – WAF – custom Oct 13, 2014 3
  • 5. Harvesters ● Usage (manually) – (pyenv) $ paster --plugin=ckanext-harvest harvester gather_consumer -c /etc/ckan/default/production.ini – (pyenv) $ paster --plugin=ckanext-harvest harvester fetch_consumer -c /etc/ckan/default/production.ini – (pyenv) $ paster --plugin=ckanext-harvest harvester run -c /etc/ckan/default/production.ini Oct 13, 2014 5
  • 6. Harvesters ● Usage (automatically) – Supervisor (for gather & fetch consumer) – Cron (for run) ● Supervisor (with profile) – $ sudo supervisorctl reread – $ sudo supervisorctl add ckan_gather_consumer – $ sudo supervisorctl add ckan_fetch_consumer – $ sudo supervisorctl start ckan_gather_consumer – $ sudo supervisorctl start ckan_fetch_consumer Oct 13, 2014 6
  • 7. Harvesters ● Custom harvester – We can implement the harvester interface to perform harvesting operations – The process take place on three steps: ● gather: get the identification ● fetch: fetch the contents ● import: create ckan package(dataset) – Implementation ● https://github.com/u10313335/ckanext-harvest/ blob/master/ckanext/harvest/harvesters/srda harvester.py Oct 13, 2014 7
  • 8. Harvesters ● Harvesting Interface from ckan.plugins.core import SingletonPlugin, implements from ckanext.harvest.interfaces import IHarvester class MyHarvester(SingletonPlugin): implements(IHarvester) def get_original_url(self, harvest_object_id): :param harvest_object_id: HarvestObject id :returns: A string with the URL to the original document def gather_stage(self, harvest_job): :param harvest_job: HarvestJob object :returns: A list of HarvestObject ids def fetch_stage(self, harvest_object): :param harvest_object: HarvestObject object :returns: True if everything went right, False if errors were found def import_stage(self, harvest_object): Oct 13, 2014 8 :param harvest_object: HarvestObject object :returns: True if everything went right, False if errors were found
  • 9. Harvesters ● Some issues – Title with non-ASCII characters – Useless update check – TGOS CSW: failed in gather stage ● Caused by OWSLib – Harvest source varies ● We should modified the extension for properly harvesting ● Modified version available – On Github: https://github.com/u10313335/ckanext-harvest Oct 13, 2014 9
  • 10. Linked Data and RDF ● Resource Description Framework – a family of W3C specifications – a metadata data model – based on XML, URI Oct 13, 2014 10 Source: http://techserviceslibrary.blogspot.tw/2011/04/rdf-resource-description.html
  • 11. Linked Data and RDF ● Vocabularies – DCAT and Dublin Core ● Two way to get RDF metadata – curl -L -H "Accept:application/rdf+xml" http://thedatahub.org/dataset/gold-prices – curl -L http://thedatahub.org/dataset/gold-prices. rdf Oct 13, 2014 11
  • 12. Documents ● Read the Docs: – https://readthedocs.org/projects/ckan-docs-tw/ Oct 13, 2014 12
  • 13. Thanks for your attention! Any Q? Oct 13, 2014 13