SlideShare a Scribd company logo
1 of 33
Human Protein
Reference Database
An analysis of the technology
powering the database and website,
and how it was developed.
Kiran Jonnalagadda
Facts About HPRD
• HPRD is a database of all disease causing
proteins in the human body.
• It is the most comprehensive database of
its kind in the world today.
• Unlike most other biological databases,
HPRD is protein-centric, not gene-centric.

2
Factors Leading to Choice of DB
• The biologists hadn’t settled on what
information was to be stored and therefore
the data type definitions changed often.
• Several data types were fairly similar to
others but not the same.
• Future extensions had to be built by techsavvy biologists with minimal assistance
from programmers.
3
What We Used
• The Zope application server, comprising of:
–
–
–
–

The Web publishing object framework.
ZODB, the object database storage system.
ZCatalog, the indexing and search system.
ZEO, the stand-alone database server for
multiple front-end Web servers.

4
Why an RDBMS Was Not Suited
• Data type definition changed frequently. In
an RDBMS, this would have meant
redefining tables every week.
• The code currently has about forty data
classes. Imagine having that many data
tables, plus tables for relationships between
them, all under frequent revision.
5
How Zope Handled These Issues
• Zope is built on Python, which offers
dynamic data structures.
• ZODB uses this ability to makes the entire
database look like one large data structure,
transparently swapping unused parts to
disk and recovering them as needed.
• ZCatalog indexes data for searching.
6
At Zope’s Core is Python
• Python is a dynamic language.
• When I say dynamic, I mean everything is dynamic!
• Code, variables, classes, modules, everything can
be modified at run-time.
• Most of Zope is built around this ability. Zope
could not have been implemented in another
language.

7
Data Storage in Zope
• In Zope, data is stored in instances of a data class.
• The data class has variables, which are like fields,
and methods, which manipulate data.
• Instances of a data class (objects) are stored in
the ZODB, making the database.
• Objects can contain other objects, forming
hierarchies.

8
Components of Zope
• ZServer (formerly Medusa)
– Handles incoming requests.
– Does HTTP, FTP, WebDAV, XML-RPC; soon SOAP.

• ZPublisher
– Maps URLs to objects and handles security.

• ZODB (Zope Object DataBase)
– Stores objects on disk in a transactional DB.

• ZEO (Zope Enterprise Objects)
– ZODB server for multiple Zope front-end servers.
9
Security in Zope
• Security is fine grained.
• Security is defined around four concepts:
– Users, Roles, Permissions and Hierarchies.

• A user is assigned one or more roles.
• A role is assigned a set of permissions.
• This set can be reassigned at different
positions in the hierarchy.
10
Security Outside Zope
• Zope’s security mechanism is limited to the
Web front.
• It is applied only to objects that directly
interface with the end-user.
• Code written in a module in the filesystem
has no security restrictions. It can do
anything.
11
Limitations in Zope
• The API for creating extensions (called
Products) is complicated and poorly
documented.
• The Property Manager interface is too
primitive. It only handles the very basic data
types such as strings, integers, boolean
fields, selection lists and multi-line text.
12
Our Extensions to Zope
• A framework for separating Zope specifics
from our data types, making it much
simpler to add new data types.
• An extended property management system
that could handle changes in data type
definitions and automatically migrate data.

13
Part II
User Interface
The rationale behind decisions
affecting how a user experiences the
database.
User Interface Design
• We started with exposing Zope’s hierarchy
as the public user interface
• But there were some elements such as the
category browser and the

15
Templates for the Web UI
• Choice of DTML and ZPT for templates.
• ZPT for templating system.

16
Part III
Project Management Lessons
What we learnt about managing a
project across continents and distant
time zones.
Project Management Issues 1
• We learnt the hard way that a project
manager’s place is with his team, not with
the client.
• Productivity suffers in the absence of an
effective collaboration tool.
• E-mail and instant messengers are not
effective collaboration tools.
18
Project Management Issues 2
• Collaboration over e-mail imposes the
burden of articulation on the
communicator, which many dislike and
therefore avoid.
• Instant messaging prevents collecting
thoughts before presenting them and is
therefore a poor planning tool.
19
Collaboration Tools
• We experimented with several
collaboration systems, with varying
effectiveness:
–
–
–
–
–

Phone calls.
Instant messengers.
Wikis.
Issue tracking software.
Mailing lists.
20
Phone Calls
• Next best thing to face-to-face discussions.
• But only connect two people unless nonstandard equipment is used.
• International calls are usually too expensive
for the resulting gain.

21
Instant Messengers
• Provide critical communication between
geographically distributed team members.
• But the pressure of maintaining continuity
in a conversation hinders pausing to gather
thoughts.
• Typing is much slower than talking.
Therefore little else gets done alongside.
22
Wikis
• The easy hyperlinking system of a wiki
combined with structured text makes
presenting information a snap.
• With a little code thrown in, Wikis could
make a wonderful project management
tool.
• A changed page notification system is
needed or changes go unnoticed.
23
Issue Tracking Software
• We use BugZilla to track issues.
• But in eight months using it, only 30 issues have
been reported using it.
• The other few hundred were reported over email, instant messengers and in person.
• Clearly, the problem is with BugZilla’s usability.
Search for a new system is on.

24
Mailing Lists
• E-mail is push media: the latest is always on
top of your inbox.
• E-mail makes an effective to-do list in an
interface the user is comfortable with.
• Mailing lists are e-mail in broadcast mode.
• Mailing lists have been the most effective
collaboration tool we’ve used so far.
25
Issues With Programmers
• Programmer skill levels and attitudes vary.
• C programmers tend to write C code in
Python.
• PHP programmers tend to write PHP code
in Python.
• Learning Python is easy but thinking in
Python takes a long time.
26
Programming Tools We Used
• CVS for source control.
• ViewCVS for a Web front-end to CVS.
• Vim in GUI mode for source editing
(preferred editor of everyone in the team).
• The print statement for debugging.

27
Tools We Should Have Used
• WingIDE is a $35 piece of software that
provides an interactive Python debugger
usable with Zope that would have in a few
minutes of usage more than paid for itself
for the hours in programmer time we
instead spent debugging using the print
statement.
28
Part IV
Things Needing Fixing
Mistakes we made during
development, how they affect things
now, and how they can be fixed.
Naming Conventions
• We started with assuming HPRD was genecentric and named several things as
GeneSomething.
• In code, this can be considered just an
identifier.
• But in a URL, there is potential for
confusing users and needs renaming.
30
Reusable Modules
• All of the code currently sits in one
directory.
• Several important pieces have nothing to
do with how they are being used.
• These modules could be separated and
contributed independently to the open
source code pool.
31
Data in Code
• There are bits of implementation specific
data embedded in code in some places,
particularly related to graph generation.
• These were introduced as quick patches
for a temporary problem but have
remained in place for months now.
• These need to be taken out so that the
code is truly reusable.
32
Documentation
• DocStrings needed in code.
• Consistent language in DocStrings.
• HTML documentation files to be
distributed with code.

33

More Related Content

Similar to The technology of the Human Protein Reference Database (draft, 2003)

SOA with Zend Framework
SOA with Zend FrameworkSOA with Zend Framework
SOA with Zend FrameworkMike Willbanks
 
Understanding Microservices
Understanding Microservices Understanding Microservices
Understanding Microservices M A Hossain Tonu
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your OrganizationMongoDB
 
Choosing the best JavaScript framework/library/toolkit
Choosing the best JavaScript framework/library/toolkitChoosing the best JavaScript framework/library/toolkit
Choosing the best JavaScript framework/library/toolkitHristo Chakarov
 
One drupal to rule them all - Drupalcamp Caceres
One drupal to rule them all - Drupalcamp CaceresOne drupal to rule them all - Drupalcamp Caceres
One drupal to rule them all - Drupalcamp Cacereshernanibf
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014MongoDB
 
Hadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yodaHadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yodaInMobi
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?CQD
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyComsysto Reply GmbH
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyComsysto Reply GmbH
 
Microservice Architecture
Microservice ArchitectureMicroservice Architecture
Microservice ArchitectureEngin Yoeyen
 
Choosing the right Technologies for your next unicorn.
Choosing the right Technologies for your next unicorn.Choosing the right Technologies for your next unicorn.
Choosing the right Technologies for your next unicorn.Gladson DSouza
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
CS3270 - DATABASE SYSTEM - Lecture (1)
CS3270 - DATABASE SYSTEM -  Lecture (1)CS3270 - DATABASE SYSTEM -  Lecture (1)
CS3270 - DATABASE SYSTEM - Lecture (1)Dilawar Khan
 
Current trends in dbms
Current trends in dbmsCurrent trends in dbms
Current trends in dbmsDaisy Joy
 
Tooling for the JavaScript Era
Tooling for the JavaScript EraTooling for the JavaScript Era
Tooling for the JavaScript Eramartinlippert
 

Similar to The technology of the Human Protein Reference Database (draft, 2003) (20)

SOA with Zend Framework
SOA with Zend FrameworkSOA with Zend Framework
SOA with Zend Framework
 
Understanding Microservices
Understanding Microservices Understanding Microservices
Understanding Microservices
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your Organization
 
WebDev Crash Course
WebDev Crash CourseWebDev Crash Course
WebDev Crash Course
 
Choosing the best JavaScript framework/library/toolkit
Choosing the best JavaScript framework/library/toolkitChoosing the best JavaScript framework/library/toolkit
Choosing the best JavaScript framework/library/toolkit
 
One drupal to rule them all - Drupalcamp Caceres
One drupal to rule them all - Drupalcamp CaceresOne drupal to rule them all - Drupalcamp Caceres
One drupal to rule them all - Drupalcamp Caceres
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
 
Hadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yodaHadoop bangalore-meetup-dec-2011-yoda
Hadoop bangalore-meetup-dec-2011-yoda
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 
Mis assignment (database)
Mis assignment (database)Mis assignment (database)
Mis assignment (database)
 
CPP19 - Revision
CPP19 - RevisionCPP19 - Revision
CPP19 - Revision
 
Microservice Architecture
Microservice ArchitectureMicroservice Architecture
Microservice Architecture
 
Choosing the right Technologies for your next unicorn.
Choosing the right Technologies for your next unicorn.Choosing the right Technologies for your next unicorn.
Choosing the right Technologies for your next unicorn.
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
CS3270 - DATABASE SYSTEM - Lecture (1)
CS3270 - DATABASE SYSTEM -  Lecture (1)CS3270 - DATABASE SYSTEM -  Lecture (1)
CS3270 - DATABASE SYSTEM - Lecture (1)
 
Current trends in dbms
Current trends in dbmsCurrent trends in dbms
Current trends in dbms
 
Tooling for the JavaScript Era
Tooling for the JavaScript EraTooling for the JavaScript Era
Tooling for the JavaScript Era
 

More from Kiran Jonnalagadda

AirJaldi photo rout (April 2008)
AirJaldi photo rout (April 2008)AirJaldi photo rout (April 2008)
AirJaldi photo rout (April 2008)Kiran Jonnalagadda
 
The medium without the message (April 2008)
The medium without the message (April 2008)The medium without the message (April 2008)
The medium without the message (April 2008)Kiran Jonnalagadda
 
Understanding technology in e-governance (December 2007)
Understanding technology in e-governance (December 2007)Understanding technology in e-governance (December 2007)
Understanding technology in e-governance (December 2007)Kiran Jonnalagadda
 
Namma service cash tracking system (January 2007)
Namma service cash tracking system (January 2007)Namma service cash tracking system (January 2007)
Namma service cash tracking system (January 2007)Kiran Jonnalagadda
 
What ails the Sarai Reader List? (August 2005)
What ails the Sarai Reader List? (August 2005)What ails the Sarai Reader List? (August 2005)
What ails the Sarai Reader List? (August 2005)Kiran Jonnalagadda
 
On blogging as a career (June 2005)
On blogging as a career (June 2005)On blogging as a career (June 2005)
On blogging as a career (June 2005)Kiran Jonnalagadda
 
Python's dynamic nature (rough slides, November 2004)
Python's dynamic nature (rough slides, November 2004)Python's dynamic nature (rough slides, November 2004)
Python's dynamic nature (rough slides, November 2004)Kiran Jonnalagadda
 
Python and Zope: An introduction (May 2004)
Python and Zope: An introduction (May 2004)Python and Zope: An introduction (May 2004)
Python and Zope: An introduction (May 2004)Kiran Jonnalagadda
 
Human database relations (March 2004)
Human database relations (March 2004)Human database relations (March 2004)
Human database relations (March 2004)Kiran Jonnalagadda
 
Introduction to Plone (November 2003)
Introduction to Plone (November 2003)Introduction to Plone (November 2003)
Introduction to Plone (November 2003)Kiran Jonnalagadda
 
Some dope on Zope (Jan 2002, Bangalore LUG)
Some dope on Zope (Jan 2002, Bangalore LUG)Some dope on Zope (Jan 2002, Bangalore LUG)
Some dope on Zope (Jan 2002, Bangalore LUG)Kiran Jonnalagadda
 
e-Governance in Karnataka: An introduction
e-Governance in Karnataka: An introductione-Governance in Karnataka: An introduction
e-Governance in Karnataka: An introductionKiran Jonnalagadda
 

More from Kiran Jonnalagadda (16)

AirJaldi photo rout (April 2008)
AirJaldi photo rout (April 2008)AirJaldi photo rout (April 2008)
AirJaldi photo rout (April 2008)
 
The medium without the message (April 2008)
The medium without the message (April 2008)The medium without the message (April 2008)
The medium without the message (April 2008)
 
Understanding technology in e-governance (December 2007)
Understanding technology in e-governance (December 2007)Understanding technology in e-governance (December 2007)
Understanding technology in e-governance (December 2007)
 
Namma service cash tracking system (January 2007)
Namma service cash tracking system (January 2007)Namma service cash tracking system (January 2007)
Namma service cash tracking system (January 2007)
 
What ails the Sarai Reader List? (August 2005)
What ails the Sarai Reader List? (August 2005)What ails the Sarai Reader List? (August 2005)
What ails the Sarai Reader List? (August 2005)
 
On blogging as a career (June 2005)
On blogging as a career (June 2005)On blogging as a career (June 2005)
On blogging as a career (June 2005)
 
Python's dynamic nature (rough slides, November 2004)
Python's dynamic nature (rough slides, November 2004)Python's dynamic nature (rough slides, November 2004)
Python's dynamic nature (rough slides, November 2004)
 
Python and Zope: An introduction (May 2004)
Python and Zope: An introduction (May 2004)Python and Zope: An introduction (May 2004)
Python and Zope: An introduction (May 2004)
 
Human database relations (March 2004)
Human database relations (March 2004)Human database relations (March 2004)
Human database relations (March 2004)
 
Introduction to Plone (November 2003)
Introduction to Plone (November 2003)Introduction to Plone (November 2003)
Introduction to Plone (November 2003)
 
XML-RPC and SOAP (April 2003)
XML-RPC and SOAP (April 2003)XML-RPC and SOAP (April 2003)
XML-RPC and SOAP (April 2003)
 
Some dope on Zope (Jan 2002, Bangalore LUG)
Some dope on Zope (Jan 2002, Bangalore LUG)Some dope on Zope (Jan 2002, Bangalore LUG)
Some dope on Zope (Jan 2002, Bangalore LUG)
 
User Management with LastUser
User Management with LastUserUser Management with LastUser
User Management with LastUser
 
Sustainability and bit-rot
Sustainability and bit-rotSustainability and bit-rot
Sustainability and bit-rot
 
e-Governance in Karnataka: An introduction
e-Governance in Karnataka: An introductione-Governance in Karnataka: An introduction
e-Governance in Karnataka: An introduction
 
Cyberpunk Sci-Fi
Cyberpunk Sci-FiCyberpunk Sci-Fi
Cyberpunk Sci-Fi
 

Recently uploaded

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

The technology of the Human Protein Reference Database (draft, 2003)

  • 1. Human Protein Reference Database An analysis of the technology powering the database and website, and how it was developed. Kiran Jonnalagadda
  • 2. Facts About HPRD • HPRD is a database of all disease causing proteins in the human body. • It is the most comprehensive database of its kind in the world today. • Unlike most other biological databases, HPRD is protein-centric, not gene-centric. 2
  • 3. Factors Leading to Choice of DB • The biologists hadn’t settled on what information was to be stored and therefore the data type definitions changed often. • Several data types were fairly similar to others but not the same. • Future extensions had to be built by techsavvy biologists with minimal assistance from programmers. 3
  • 4. What We Used • The Zope application server, comprising of: – – – – The Web publishing object framework. ZODB, the object database storage system. ZCatalog, the indexing and search system. ZEO, the stand-alone database server for multiple front-end Web servers. 4
  • 5. Why an RDBMS Was Not Suited • Data type definition changed frequently. In an RDBMS, this would have meant redefining tables every week. • The code currently has about forty data classes. Imagine having that many data tables, plus tables for relationships between them, all under frequent revision. 5
  • 6. How Zope Handled These Issues • Zope is built on Python, which offers dynamic data structures. • ZODB uses this ability to makes the entire database look like one large data structure, transparently swapping unused parts to disk and recovering them as needed. • ZCatalog indexes data for searching. 6
  • 7. At Zope’s Core is Python • Python is a dynamic language. • When I say dynamic, I mean everything is dynamic! • Code, variables, classes, modules, everything can be modified at run-time. • Most of Zope is built around this ability. Zope could not have been implemented in another language. 7
  • 8. Data Storage in Zope • In Zope, data is stored in instances of a data class. • The data class has variables, which are like fields, and methods, which manipulate data. • Instances of a data class (objects) are stored in the ZODB, making the database. • Objects can contain other objects, forming hierarchies. 8
  • 9. Components of Zope • ZServer (formerly Medusa) – Handles incoming requests. – Does HTTP, FTP, WebDAV, XML-RPC; soon SOAP. • ZPublisher – Maps URLs to objects and handles security. • ZODB (Zope Object DataBase) – Stores objects on disk in a transactional DB. • ZEO (Zope Enterprise Objects) – ZODB server for multiple Zope front-end servers. 9
  • 10. Security in Zope • Security is fine grained. • Security is defined around four concepts: – Users, Roles, Permissions and Hierarchies. • A user is assigned one or more roles. • A role is assigned a set of permissions. • This set can be reassigned at different positions in the hierarchy. 10
  • 11. Security Outside Zope • Zope’s security mechanism is limited to the Web front. • It is applied only to objects that directly interface with the end-user. • Code written in a module in the filesystem has no security restrictions. It can do anything. 11
  • 12. Limitations in Zope • The API for creating extensions (called Products) is complicated and poorly documented. • The Property Manager interface is too primitive. It only handles the very basic data types such as strings, integers, boolean fields, selection lists and multi-line text. 12
  • 13. Our Extensions to Zope • A framework for separating Zope specifics from our data types, making it much simpler to add new data types. • An extended property management system that could handle changes in data type definitions and automatically migrate data. 13
  • 14. Part II User Interface The rationale behind decisions affecting how a user experiences the database.
  • 15. User Interface Design • We started with exposing Zope’s hierarchy as the public user interface • But there were some elements such as the category browser and the 15
  • 16. Templates for the Web UI • Choice of DTML and ZPT for templates. • ZPT for templating system. 16
  • 17. Part III Project Management Lessons What we learnt about managing a project across continents and distant time zones.
  • 18. Project Management Issues 1 • We learnt the hard way that a project manager’s place is with his team, not with the client. • Productivity suffers in the absence of an effective collaboration tool. • E-mail and instant messengers are not effective collaboration tools. 18
  • 19. Project Management Issues 2 • Collaboration over e-mail imposes the burden of articulation on the communicator, which many dislike and therefore avoid. • Instant messaging prevents collecting thoughts before presenting them and is therefore a poor planning tool. 19
  • 20. Collaboration Tools • We experimented with several collaboration systems, with varying effectiveness: – – – – – Phone calls. Instant messengers. Wikis. Issue tracking software. Mailing lists. 20
  • 21. Phone Calls • Next best thing to face-to-face discussions. • But only connect two people unless nonstandard equipment is used. • International calls are usually too expensive for the resulting gain. 21
  • 22. Instant Messengers • Provide critical communication between geographically distributed team members. • But the pressure of maintaining continuity in a conversation hinders pausing to gather thoughts. • Typing is much slower than talking. Therefore little else gets done alongside. 22
  • 23. Wikis • The easy hyperlinking system of a wiki combined with structured text makes presenting information a snap. • With a little code thrown in, Wikis could make a wonderful project management tool. • A changed page notification system is needed or changes go unnoticed. 23
  • 24. Issue Tracking Software • We use BugZilla to track issues. • But in eight months using it, only 30 issues have been reported using it. • The other few hundred were reported over email, instant messengers and in person. • Clearly, the problem is with BugZilla’s usability. Search for a new system is on. 24
  • 25. Mailing Lists • E-mail is push media: the latest is always on top of your inbox. • E-mail makes an effective to-do list in an interface the user is comfortable with. • Mailing lists are e-mail in broadcast mode. • Mailing lists have been the most effective collaboration tool we’ve used so far. 25
  • 26. Issues With Programmers • Programmer skill levels and attitudes vary. • C programmers tend to write C code in Python. • PHP programmers tend to write PHP code in Python. • Learning Python is easy but thinking in Python takes a long time. 26
  • 27. Programming Tools We Used • CVS for source control. • ViewCVS for a Web front-end to CVS. • Vim in GUI mode for source editing (preferred editor of everyone in the team). • The print statement for debugging. 27
  • 28. Tools We Should Have Used • WingIDE is a $35 piece of software that provides an interactive Python debugger usable with Zope that would have in a few minutes of usage more than paid for itself for the hours in programmer time we instead spent debugging using the print statement. 28
  • 29. Part IV Things Needing Fixing Mistakes we made during development, how they affect things now, and how they can be fixed.
  • 30. Naming Conventions • We started with assuming HPRD was genecentric and named several things as GeneSomething. • In code, this can be considered just an identifier. • But in a URL, there is potential for confusing users and needs renaming. 30
  • 31. Reusable Modules • All of the code currently sits in one directory. • Several important pieces have nothing to do with how they are being used. • These modules could be separated and contributed independently to the open source code pool. 31
  • 32. Data in Code • There are bits of implementation specific data embedded in code in some places, particularly related to graph generation. • These were introduced as quick patches for a temporary problem but have remained in place for months now. • These need to be taken out so that the code is truly reusable. 32
  • 33. Documentation • DocStrings needed in code. • Consistent language in DocStrings. • HTML documentation files to be distributed with code. 33

Editor's Notes

  1. Insert points here outlining the data requirements of HPRD.
  2. Needs more slides before this explaining the organization of a project in Zope.
  3. Backup for statements on C and PHP programmers: In a C function, all variables have to be declared first with an explicit data type before they can be used. Variables cannot be declared just before use. C programmers tend to reuse temporary variables in a long function. A C programmer new to Python will therefore tend to write C code translated into Python. Examples of this coding style are initializing temporary variables to blank values (“” for strings and 0 for integers) and reusing the same variables instead of deleting them and using new ones, or better, writing nested functions. An example problem caused by this style is when a temporary variable that is used by a part of a long function expecting it to be initialized to a blank value now suddenly contains something else because another part of the function above this area was extended to use the temporary variable and the programmer forgot to reset it after finishing using it. Such bugs can wreak havoc in code that was functioning perfectly before. The problem with PHP programmers is not as severe. Because PHP’s object orientedness isn’t very good, PHP programmers again tend to write a bunch of functions when they should have defined a new class instead. Same code management problems follow.