SlideShare a Scribd company logo
1 of 41
A Robust Open-source  GEDCOM Parser Dallan Quass  [email_address] Ryan Knight  [email_address]
What's a GEDCOM? 0 HEAD 1 SOUR PAF 2 NAME Personal Ancestral File 2 VERS 5.2.18.0 2 CORP The Church of Jesus Christ of Latter-day Saints 3 ADDR 50 East North Temple Street 4 CONT Salt Lake City, UT 84150 4 CONT USA 1 DEST Other 1 DATE 9 Aug 2006 2 TIME 19:57:47 1 FILE temp-paf.ged 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 LANG English 1 SUBM @SUB1@ 0 @SUB1@ SUBM 1 NAME Dallan Quass 0 @I1@ INDI 1 NAME Dallan /Quass/ 2 SURN Quass 2 GIVN Dallan If this looks unfamiliar to you, you may not get a lot out of this talk On the other hand, the purpose of this project is to  handle this for you, so you can develop cool projects in genealogy and let this be unfamiliar to you!
Why is parsing GEDCOMs so hard?
Challenge #1 – Character set detection 0 HEAD 1 SOUR PAF 2 NAME Personal Ancestral File 2 VERS 5.2.18.0 2 CORP The Church of Jesus Christ of Latter-day Saints 3 ADDR 50 East North Temple Street 4 CONT Salt Lake City, UT 84150 4 CONT USA 1 DEST Other 1 DATE 9 Aug 2006 2 TIME 19:57:47 1 FILE temp-paf.ged 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 LANG English 1 SUBM @SUB1@ 0 @SUB1@ SUBM 1 NAME Dallan Quass 0 @I1@ INDI 1 NAME Dallan /Quass/ 2 SURN Quass 2 GIVN Dallan Should be easy, except...
Challenge #1 – Character set detection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Challenge #1 – Character set detection ANSEL
Challenge #2 – Custom tags The GEDCOM specification hasn't been updated in a  LONG  time
Challenge #3 – Misused tags
Shout out Tim Forsythe VGed - GEDCOM validator http://ancestorsnow.blogspot.com/ 2011/07/vged.html
ALIA 1 SEX M 1 ALIA /Ted/ 1 BIRT
SOUR 0 @N6@ NOTE 1 CONT adopted surname Termaat 2 SOUR @S9@
DATA 2 SOUR @S2149874917@ 3 DATA 4 DATE 11 Sep 1924 3 NOTE ... 3 DATA 4 TEXT ... 2 SOUR @S99@ 3 DATA 4 TEXT William Donald ... 4 DATE 1 Sep 1997 2 SOUR @S28@ 3 PAGE Indian Prarie... 3 QUAY 3 3 DATE 28 Feb 2005
Challenge #4 – Unused tags Event Phone Event Agency Source Citation Event Type
Challenge #5 – Names
GEDCOM  Standard ? The code  is more what you'd call  " guidelines "  than actual rules .
Two goals
Goal #1 – Parse GEDCOMs into a  de facto  object model De Facto: In fact or in practice; in actual use  or existence, regardless of official  or legal status.  – Wictionary.org Model should be straightforward, easy to use and understand
Goal #2 – Round-trip From GEDCOM To Object Model Back to GEDCOM without information loss
Nirvana
There is no Nirvana
But we can get pretty close 94%
How is it done? ???
Object model
People
Extensions
GedML ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],GEDCOM -> SAX events ANSEL reader & writer
Parser ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
GEDCOM Export Visitor pattern 600 LoC
JSON GEDCOM  POJO  JSON  POJO  GEDCOM Simple model persistence using Google GSON
Further thoughts
Do we need a radically-different  data-exchange model for genealogy?
I don't know A new proposed object model could use this project to migrate existing GEDCOMs to the  de facto  model, then translate the  de facto  model objects to the new model
Do we need GEDCOM validation tools?
Definitely! A list of “standard” custom tags would also be pretty helpful
We live in the real world
Purpose of this project
Demonstration of Gedcom Server ,[object Object],[object Object],[object Object],[object Object],[object Object]
Demonstration of Gedcom Server
Demonstration of Gedcom Server
Conclusion Images appearing on these slides are copyrighted by the contributors to  http://commons.wikimedia.org and are used under license ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 

More Related Content

What's hot

33degree Krzysztof Debski - Let's build a solid base for a scale
33degree Krzysztof Debski - Let's build a solid base for a scale33degree Krzysztof Debski - Let's build a solid base for a scale
33degree Krzysztof Debski - Let's build a solid base for a scale
Krzysztof Debski
 

What's hot (19)

Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
 
Devopsdays.pl 2015 krzysztof_debski (2)
Devopsdays.pl 2015 krzysztof_debski (2)Devopsdays.pl 2015 krzysztof_debski (2)
Devopsdays.pl 2015 krzysztof_debski (2)
 
Deployments in one click!
Deployments in one click!Deployments in one click!
Deployments in one click!
 
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)
 
Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...
Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...
Jenkins User Conference - Preparing for Enterprise Continuous Delivery: 5 Cri...
 
12 tricks to avoid hackers breaks your CI / CD
12 tricks to avoid hackers breaks your  CI / CD12 tricks to avoid hackers breaks your  CI / CD
12 tricks to avoid hackers breaks your CI / CD
 
33degree Krzysztof Debski - Let's build a solid base for a scale
33degree Krzysztof Debski - Let's build a solid base for a scale33degree Krzysztof Debski - Let's build a solid base for a scale
33degree Krzysztof Debski - Let's build a solid base for a scale
 
Getting started with Go - Florin Patan - Codemotion Rome 2017
Getting started with Go - Florin Patan - Codemotion Rome 2017Getting started with Go - Florin Patan - Codemotion Rome 2017
Getting started with Go - Florin Patan - Codemotion Rome 2017
 
Rooted con 2020 - from the heaven to hell in the CI - CD
Rooted con 2020 - from the heaven to hell in the CI - CDRooted con 2020 - from the heaven to hell in the CI - CD
Rooted con 2020 - from the heaven to hell in the CI - CD
 
Мониторинг облачной CI-системы на примере Jenkins / Александр Акбашев (HERE T...
Мониторинг облачной CI-системы на примере Jenkins / Александр Акбашев (HERE T...Мониторинг облачной CI-системы на примере Jenkins / Александр Акбашев (HERE T...
Мониторинг облачной CI-системы на примере Jenkins / Александр Акбашев (HERE T...
 
Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017
Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017
Golang and Domain Specific Languages - Lorenzo Fontana - Codemotion Rome 2017
 
Graalvm with Groovy and Kotlin - Greach 2019
Graalvm with Groovy and Kotlin - Greach 2019Graalvm with Groovy and Kotlin - Greach 2019
Graalvm with Groovy and Kotlin - Greach 2019
 
The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
The Blameless Cloud: Bringing Actionable Retrospectives to SalesforceThe Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
 
Technical Product Owner or How to build technical backing for services
Technical Product Owner or How to build technical backing for servicesTechnical Product Owner or How to build technical backing for services
Technical Product Owner or How to build technical backing for services
 
[2020 git lab commit] continuous infrastructure
[2020 git lab commit] continuous infrastructure[2020 git lab commit] continuous infrastructure
[2020 git lab commit] continuous infrastructure
 
On the development and distribution of R packages
On the development and distribution of R packagesOn the development and distribution of R packages
On the development and distribution of R packages
 
TDD on android. Why and How? (Coding Serbia 2019)
TDD on android. Why and How? (Coding Serbia 2019)TDD on android. Why and How? (Coding Serbia 2019)
TDD on android. Why and How? (Coding Serbia 2019)
 
Developing Apps With React Native
Developing Apps With React NativeDeveloping Apps With React Native
Developing Apps With React Native
 
DevOps, Waffles, and Superheroes
DevOps, Waffles, and SuperheroesDevOps, Waffles, and Superheroes
DevOps, Waffles, and Superheroes
 

Viewers also liked

Using WeRelate.org (2009)
Using WeRelate.org (2009)Using WeRelate.org (2009)
Using WeRelate.org (2009)
Dallan Quass
 
Why share your genealogy content on WeRelate.org (2009)
Why share your genealogy content on WeRelate.org (2009)Why share your genealogy content on WeRelate.org (2009)
Why share your genealogy content on WeRelate.org (2009)
Dallan Quass
 
Έκθεση κειμηλίων: Η προσφορά των Κυπρίων στους αγώνες του Έθνους.
Έκθεση κειμηλίων: Η προσφορά των Κυπρίων στους αγώνες του Έθνους.Έκθεση κειμηλίων: Η προσφορά των Κυπρίων στους αγώνες του Έθνους.
Έκθεση κειμηλίων: Η προσφορά των Κυπρίων στους αγώνες του Έθνους.
Σμαράγδα Φαρίδου
 
Produktmanager Peter
Produktmanager PeterProduktmanager Peter
Produktmanager Peter
alconsult
 

Viewers also liked (19)

Using WeRelate.org (2009)
Using WeRelate.org (2009)Using WeRelate.org (2009)
Using WeRelate.org (2009)
 
FamilySearch Reference Client
FamilySearch Reference ClientFamilySearch Reference Client
FamilySearch Reference Client
 
Why share your genealogy content on WeRelate.org (2009)
Why share your genealogy content on WeRelate.org (2009)Why share your genealogy content on WeRelate.org (2009)
Why share your genealogy content on WeRelate.org (2009)
 
Κανόνες Δικαίου και ηθικής η διαφορά. (απο Αλεξ. Μάρα)
Κανόνες Δικαίου και ηθικής   η διαφορά. (απο Αλεξ. Μάρα)Κανόνες Δικαίου και ηθικής   η διαφορά. (απο Αλεξ. Μάρα)
Κανόνες Δικαίου και ηθικής η διαφορά. (απο Αλεξ. Μάρα)
 
τριωδιο
τριωδιοτριωδιο
τριωδιο
 
Προστασία του περιβάλλοντος: ο Νόμος του Θεού και οι νόμοι των ανθρώπων.
Προστασία του περιβάλλοντος: ο Νόμος του Θεού και οι νόμοι των ανθρώπων.Προστασία του περιβάλλοντος: ο Νόμος του Θεού και οι νόμοι των ανθρώπων.
Προστασία του περιβάλλοντος: ο Νόμος του Θεού και οι νόμοι των ανθρώπων.
 
το καταναλωτικο προτυπο
το καταναλωτικο προτυποτο καταναλωτικο προτυπο
το καταναλωτικο προτυπο
 
σχολικος εκφοβισμος
σχολικος εκφοβισμοςσχολικος εκφοβισμος
σχολικος εκφοβισμος
 
Θρησκεύματα
ΘρησκεύματαΘρησκεύματα
Θρησκεύματα
 
Συνάντηση του Χριστιανισμού με τον Ελληνισμό
Συνάντηση του Χριστιανισμού με τον ΕλληνισμόΣυνάντηση του Χριστιανισμού με τον Ελληνισμό
Συνάντηση του Χριστιανισμού με τον Ελληνισμό
 
Κοντάκιο- Ακάθιστος Ύμνος (Από Σμαράγδα Φαρίδου, θεολόγο 2ου ΠΠΓ Θεσσαλονίκης)
Κοντάκιο- Ακάθιστος Ύμνος (Από Σμαράγδα Φαρίδου, θεολόγο 2ου ΠΠΓ Θεσσαλονίκης)Κοντάκιο- Ακάθιστος Ύμνος (Από Σμαράγδα Φαρίδου, θεολόγο 2ου ΠΠΓ Θεσσαλονίκης)
Κοντάκιο- Ακάθιστος Ύμνος (Από Σμαράγδα Φαρίδου, θεολόγο 2ου ΠΠΓ Θεσσαλονίκης)
 
σεναριο κπα αθλητισμος και βια από Δρ Σμαράγδα Φαρίδου
σεναριο κπα   αθλητισμος και βια από Δρ Σμαράγδα Φαρίδουσεναριο κπα   αθλητισμος και βια από Δρ Σμαράγδα Φαρίδου
σεναριο κπα αθλητισμος και βια από Δρ Σμαράγδα Φαρίδου
 
Έκθεση κειμηλίων: Η προσφορά των Κυπρίων στους αγώνες του Έθνους.
Έκθεση κειμηλίων: Η προσφορά των Κυπρίων στους αγώνες του Έθνους.Έκθεση κειμηλίων: Η προσφορά των Κυπρίων στους αγώνες του Έθνους.
Έκθεση κειμηλίων: Η προσφορά των Κυπρίων στους αγώνες του Έθνους.
 
Ο Αβραάμ φιλοξενεί το Θεό στη σκηνή του
Ο Αβραάμ φιλοξενεί το Θεό στη σκηνή τουΟ Αβραάμ φιλοξενεί το Θεό στη σκηνή του
Ο Αβραάμ φιλοξενεί το Θεό στη σκηνή του
 
Ο ΥΜΝΟΣ ΤΗΣ ΑΓΑΠΗΣ
Ο ΥΜΝΟΣ ΤΗΣ ΑΓΑΠΗΣΟ ΥΜΝΟΣ ΤΗΣ ΑΓΑΠΗΣ
Ο ΥΜΝΟΣ ΤΗΣ ΑΓΑΠΗΣ
 
Produktmanager Peter
Produktmanager PeterProduktmanager Peter
Produktmanager Peter
 
τι ειναι κακο (Από Σμαράγδα Φαρίδου, θεολόγο 2ου ΠΠΓ Θεσ/νίκης)
τι ειναι κακο (Από Σμαράγδα Φαρίδου, θεολόγο 2ου ΠΠΓ Θεσ/νίκης)τι ειναι κακο (Από Σμαράγδα Φαρίδου, θεολόγο 2ου ΠΠΓ Θεσ/νίκης)
τι ειναι κακο (Από Σμαράγδα Φαρίδου, θεολόγο 2ου ΠΠΓ Θεσ/νίκης)
 
17 . οι γυναικες.
17 . οι γυναικες.17 . οι γυναικες.
17 . οι γυναικες.
 
Μεγάλη Εβδομάς
Μεγάλη ΕβδομάςΜεγάλη Εβδομάς
Μεγάλη Εβδομάς
 

Similar to A Robust Open-source GEDCOM Parser

Usability in the GeoWeb
Usability in the GeoWebUsability in the GeoWeb
Usability in the GeoWeb
Dave Bouwman
 
DRUG - RDSTK Talk
DRUG - RDSTK TalkDRUG - RDSTK Talk
DRUG - RDSTK Talk
rtelmore
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
chomas kandar
 

Similar to A Robust Open-source GEDCOM Parser (20)

Front-End Tooling
Front-End ToolingFront-End Tooling
Front-End Tooling
 
Usability in the GeoWeb
Usability in the GeoWebUsability in the GeoWeb
Usability in the GeoWeb
 
DRUG - RDSTK Talk
DRUG - RDSTK TalkDRUG - RDSTK Talk
DRUG - RDSTK Talk
 
Styleguide-Driven Development: The New Web Development
Styleguide-Driven Development: The New Web DevelopmentStyleguide-Driven Development: The New Web Development
Styleguide-Driven Development: The New Web Development
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Go
 
Supercharging project health check
Supercharging project health checkSupercharging project health check
Supercharging project health check
 
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
 
ICONUK 2015 - Gradle Up!
ICONUK 2015 - Gradle Up!ICONUK 2015 - Gradle Up!
ICONUK 2015 - Gradle Up!
 
Pain Driven Development by Alexandr Sugak
Pain Driven Development by Alexandr SugakPain Driven Development by Alexandr Sugak
Pain Driven Development by Alexandr Sugak
 
Belgium jenkins-meetup-job-jungle-0.1
Belgium jenkins-meetup-job-jungle-0.1Belgium jenkins-meetup-job-jungle-0.1
Belgium jenkins-meetup-job-jungle-0.1
 
JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...
JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...
JDD2015: Forgetting Java: Why Java Should Die in Flames and Take its Develope...
 
Why Gradle?
Why Gradle?Why Gradle?
Why Gradle?
 
Supercharge your Code to get optimal Database Performance
Supercharge your Code to get optimal Database PerformanceSupercharge your Code to get optimal Database Performance
Supercharge your Code to get optimal Database Performance
 
The Duck Teaches Learn to debug from the masters. Local to production- kill ...
The Duck Teaches  Learn to debug from the masters. Local to production- kill ...The Duck Teaches  Learn to debug from the masters. Local to production- kill ...
The Duck Teaches Learn to debug from the masters. Local to production- kill ...
 
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a ProSkip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
 
How to Set Up Esri Geoportal Server 1.2.2 on Windows
How to Set Up Esri Geoportal Server 1.2.2 on WindowsHow to Set Up Esri Geoportal Server 1.2.2 on Windows
How to Set Up Esri Geoportal Server 1.2.2 on Windows
 
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
 
Into The Box 2018 Ortus Keynote
Into The Box 2018 Ortus KeynoteInto The Box 2018 Ortus Keynote
Into The Box 2018 Ortus Keynote
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
 

A Robust Open-source GEDCOM Parser

  • 1. A Robust Open-source GEDCOM Parser Dallan Quass [email_address] Ryan Knight [email_address]
  • 2. What's a GEDCOM? 0 HEAD 1 SOUR PAF 2 NAME Personal Ancestral File 2 VERS 5.2.18.0 2 CORP The Church of Jesus Christ of Latter-day Saints 3 ADDR 50 East North Temple Street 4 CONT Salt Lake City, UT 84150 4 CONT USA 1 DEST Other 1 DATE 9 Aug 2006 2 TIME 19:57:47 1 FILE temp-paf.ged 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 LANG English 1 SUBM @SUB1@ 0 @SUB1@ SUBM 1 NAME Dallan Quass 0 @I1@ INDI 1 NAME Dallan /Quass/ 2 SURN Quass 2 GIVN Dallan If this looks unfamiliar to you, you may not get a lot out of this talk On the other hand, the purpose of this project is to handle this for you, so you can develop cool projects in genealogy and let this be unfamiliar to you!
  • 3. Why is parsing GEDCOMs so hard?
  • 4. Challenge #1 – Character set detection 0 HEAD 1 SOUR PAF 2 NAME Personal Ancestral File 2 VERS 5.2.18.0 2 CORP The Church of Jesus Christ of Latter-day Saints 3 ADDR 50 East North Temple Street 4 CONT Salt Lake City, UT 84150 4 CONT USA 1 DEST Other 1 DATE 9 Aug 2006 2 TIME 19:57:47 1 FILE temp-paf.ged 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 LANG English 1 SUBM @SUB1@ 0 @SUB1@ SUBM 1 NAME Dallan Quass 0 @I1@ INDI 1 NAME Dallan /Quass/ 2 SURN Quass 2 GIVN Dallan Should be easy, except...
  • 5.
  • 6. Challenge #1 – Character set detection ANSEL
  • 7. Challenge #2 – Custom tags The GEDCOM specification hasn't been updated in a LONG time
  • 8. Challenge #3 – Misused tags
  • 9. Shout out Tim Forsythe VGed - GEDCOM validator http://ancestorsnow.blogspot.com/ 2011/07/vged.html
  • 10. ALIA 1 SEX M 1 ALIA /Ted/ 1 BIRT
  • 11. SOUR 0 @N6@ NOTE 1 CONT adopted surname Termaat 2 SOUR @S9@
  • 12. DATA 2 SOUR @S2149874917@ 3 DATA 4 DATE 11 Sep 1924 3 NOTE ... 3 DATA 4 TEXT ... 2 SOUR @S99@ 3 DATA 4 TEXT William Donald ... 4 DATE 1 Sep 1997 2 SOUR @S28@ 3 PAGE Indian Prarie... 3 QUAY 3 3 DATE 28 Feb 2005
  • 13. Challenge #4 – Unused tags Event Phone Event Agency Source Citation Event Type
  • 15. GEDCOM Standard ? The code is more what you'd call " guidelines " than actual rules .
  • 17. Goal #1 – Parse GEDCOMs into a de facto object model De Facto: In fact or in practice; in actual use or existence, regardless of official or legal status. – Wictionary.org Model should be straightforward, easy to use and understand
  • 18. Goal #2 – Round-trip From GEDCOM To Object Model Back to GEDCOM without information loss
  • 20. There is no Nirvana
  • 21. But we can get pretty close 94%
  • 22. How is it done? ???
  • 26.
  • 27.
  • 28. GEDCOM Export Visitor pattern 600 LoC
  • 29. JSON GEDCOM POJO JSON POJO GEDCOM Simple model persistence using Google GSON
  • 31. Do we need a radically-different data-exchange model for genealogy?
  • 32. I don't know A new proposed object model could use this project to migrate existing GEDCOMs to the de facto model, then translate the de facto model objects to the new model
  • 33. Do we need GEDCOM validation tools?
  • 34. Definitely! A list of “standard” custom tags would also be pretty helpful
  • 35. We live in the real world
  • 36. Purpose of this project
  • 37.
  • 40.
  • 41.