SlideShare uma empresa Scribd logo
1 de 25
Content Storage with Apache
Jackrabbit
2009-03-25, Jukka Zitting
Introducing JCR
Content Repository for Java™ Technology API
-> Version 1.0 defined in JSR 170, final in 2006
-> Version 2.0 defined in JSR 283, final (hopefully) in 2009
-> Open source RI and TCK developed in Apache Jackrabbit
Flexible, hierarchically ordered content store with features like
full text search, versioning, transactions, observation, etc.
Combines and extends features often found in file systems and
databases; a "best of both worlds" approach.
Introducing Apache Jackrabbit
Entered Apache Incubator
in 2004, graduated in 2006,
now a TLP with 24
committers
Version 1.0 (and JCR RI) in
2006, currently at 1.5.3.
Version 2.0 (and JCR 2.0 RI)
planned for 2009.
Also: JCR Commons, Sling
Components:
jackrabbit-api
jackrabbit-core
jackrabbit-standalone
jackrabbit-webapp
jackrabbit-jca
jackrabbit-ocm
jackrabbit-jcr-rmi
jackrabbit-webdav
jackrabbit-jcr-server
etc.
Structure of a content repository
Consists of one or more workspaces, one of which is the
default workspace.
A workspace consists of a tree of nodes, each of which can
have zero or more child nodes. Each workspace has a single
root node.
Nodes have properties. Properties are typed (string,
integer, date,  binary, reference, etc.) and can be either
single- or multivalued.
Structure, cont.
Each node or property has a name, and can be accessed
using a path. Names are namespaced.
A referenceable node has a UUID, through which it can be
accessed or referenced. Referential integrity is guaranteed.
Each node has a primary node type and zero or more mixin
types. Node types define the structure of a node. A node
can also be unstructured.
Content repository features
Read/write
Search (XPath and SQL)
XML import/export
Observation
Versioning
Node types
Locking
Access control
Atomic changes
XA transactions
Getting started with Jackrabbit
Download and run the standalone server:
java -jar jackrabbit-standalone-1.5.3.jar
-> Web interface at http://localhost:8080/
-> WebDAV at http://localhost:8080/repository/default
-> JCR access over RMI at http://localhost:8080/rmi
-> Repository data in ./jackrabbit
-> Configuration in ./jackrabbit/repository.xml
If you have a servlet container, use the webapp
If you have a J2EE application server, use jca
Remote access
JCR-RMI layer available
since Jackrabbit 0.9.
Good functional coverage,
not so good performance.
-> administrative tools
Look at clustering for
better performance.
Jackrabbit 1.6: spi2dav
o.a.j.rmi.repository:
new URLRemoteRepository(
    "http://.../rmi");
new RMIRemoteRepository(
    "//.../repository");
Classpath:
jcr-1.0.jar
jackrabbit-jcr-rmi-1.5.3.jar
jackrabbit-api-1.5.3.jar
(also in rmiregistry!)
Jackrabbit as a shared resource
JCA adapter in an
application server
-> accessed through JNDI
Jackrabbit webapp in a
servlet container
-> JNDI or servlet context
-> JNDI: complex setup
-> cross-context access
JNDI configuration with
jackrabbit-servlet:
<servlet>
  <servlet-name>Repository</servlet-name>
  <servlet-class>
  org.apache.jackrabbit.servlet.JNDIRepositoryServlet
  </servlet-class>
  <init-param>
    <param-name>location</param-name>
    <param-value<javax/jcr/Repository</param-value>
  </init-param>
</servlet>
Accessing the repository:
public class MyServlet extends HttpServlet {
    private final Repository repository =
        new ServletRepository(this);
}
Jackrabbit in embedded mode
RepositoryConfig config = RepositoryConfig.create(
      "/path/to/repository", "/path/to/repository.xml");
RepositoryImpl repository = RepositoryImpl.create(config);
try {
    // Use the javax.jcr interfaces to access the repository
} finally {
    repository.shutdown();
}
Embedded mode, cont.
Only a single instance can be
running at a time.
For concurrent access:
- multiple sessions
- RMI for remote access
- clustering
Also: TransientRepository
Maven coordinates
<dependency>
  <groupId>org.apache.jackrabbit</groupId>
  <artifactId>jackrabbit-core</artifactId>
  <version>1.5.3</version>
  <exclusions>
    <exclusion>
      <groupId>commons-logging</groupId>
      <artifactId>commons-logging</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-log4j12</artifactId>
  <version>1.5.3</version>
</dependency>
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>jcl-over-slf4j</artifactId>
  <version>1.5.3</version>
</dependency>
Logging - what's this SLF4J thing?
Jackrabbit uses the SLF4J
logging facade for logging.
Benefits: Great for
embedded uses, can adapt
to 
Drawbacks: What do I put
in my classpath?
SLF4J in practice
SLF4J API is automatically
included as a dependency of
jackrabbit-core.
You need to explicitly add
the SLF4J implementation.
Jackrabit webapp, jca and
standalone use slf4j-log4j,
so you can use normal log4j
configuration.
Classpath with log4j:
slf4j-api-1.5.3.jar
slf4j-log4j-1.5.3.jar
log4j-1.2.14.jar
log4j.properties
Classpath with no logging:
slf4j-api-1.5.3.jar
slf4j-nop-1.5.3.jar
Repository configuration
Configuration in a repository.xml file. Default configuration
shipped with Jackrabbit. Structure defined in a DTD.
Contains global settings and a workspace configuration
template. The template is instantiated to a workspace.xml
configuration file for each new workspace.
Main elements: clustering, workspace, versioning, security,
persistence, search index, data store, file system
Persistence managers
Use one of the bundle
persistence managers.
-> select one for your db
Other persistence managers
mostly for backwards
compatibility.
Default based on embedded
Apache Derby.
Configuration:
driver,url,user,password
schema
schemaObjectPrefix
minBlobSize
bundleCacheSize
consistencyCheck/Fix
blockOnConnectionLoss
Needs CREATE TABLE
permissions!
Search index
Per-workspace search
indexes based on Apache
Lucene.
By default everything is
indexed. Use index
configuration to customize.
Clustering: Each cluster
node maintains local search
indexes.
Configuration:
min/maxMergeDocs
mergeFactor
analyzer
textFilterClasses
respectDocumentOrder
resultFetchSize
Performance:
property, type, ft -> fast
path -> slow
Text extraction
Full text indexing of file
contents based on various
parser libraries (POI,
PDFBox, etc.).
Currently only for the
jcr:data property with
correct jcr:mimeType.
Jackrabbit 2.0: indexing of
all binary properties.
textFilterClasses:
PlainTextExtractor
MsWordTextExtractor
MsExcelTextExtractor
MsPowerPointTextExtractor
PdfTextExtractor
OpenOfficeTextExtractor
RTFTextExtractor
HTMLTextExtractor
XMLTextExtractor
Data store - dealing with lots of data
Data store feature available
since Jackrabbit 1.4
Content-addressed storage
of large binary properties.
Completely transparent to
client applications.
Uses garbage collection to
remove unused data.
Implementations:
FileDataStore
DbDataStore
sandbox: S3DataStore
Garbage collection:
gc = si.createDSGC();
gc.scan();
gc.stopScan();
gc.deleteUnused();
Content modeling: David's model
1. Data First. Structure Later. Maybe.
2.Drive the content hierarchy, don't let it happen.
3.Workspaces are for clone(), merge() and update()
4.Beware of Same Name Siblings
5.References considered harmful
6.Files are Files are Files
7.ID's are evil
Content modeling: Example
CREATE TABLE author (
    id INTEGER,
    name VARCHAR,
);
CREATE TABLE post (
    id INTEGER,
    author INTEGER,
    posted DATETIME,
    title VARCHAR,
    body TEXT
);
/blog
  /jukka
    @name = Jukka Zitting
    /2009
      /03
        /25
          /hello
            @author = jukka
            @posted
            @title
            @body
Example, cont.
CREATE TABLE  comment (
    id INTEGER,
    post INTEGER,
    title VARCHAR,
    body TEXT
);
CREATE TABLE media (
    id INTEGER,
    post INTEGER,
    data BLOB,
    caption VARCHAR
);
/hello
  /comments
    /salut
      @title = Salut!
      @body
  /media [nt:folder]
    /image.jpg [nt:file]
    /code [nt:folder]
      /Example.java [nt:file]
Example, cont.
Versioning:
/hello [mix:versionable]
  @jcr:versionHistory
  @jcr:baseVersion
node.checkout();
// ...
node.save();
node.checkin();
Locking:
/hello [mix:lockable]
node.lock();
// ...
node.unlock();
Common issues: Content hierarchy
Jackrabbit doesn't support very flat content hierarchies.
You'll start seeing problems when you put more than 10k
child nodes under a single parent.
Solution: Add more depth to your hierarchy. Divide entries
by date, category, author, etc. If nothing else, use the first
letter(s) of a title, a content hash, or even a random number
to distribute the nodes.
Note: You can still access the entries as a single flat set
through search. The hierarchy is for browsing.
Common issues: Concurrent edits
Three ways to handle concurrent edits:
1. Merge changes
2. Fail conflicting changes 
3. Block concurrent changes
Jackrabbit does 1 by default, and falls back to 2 when
merge fails. You can explicitly opt for 3 by using the JCR
locking feature.
Estimate: How often conflicts would happen? Will the
benefits of locking be worth the overhead.
Questions?
 

Mais conteúdo relacionado

Mais procurados

Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiJukka Zitting
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlLeon Chen
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips confluent
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache CamelClaus Ibsen
 
Logging using ELK Stack for Microservices
Logging using ELK Stack for MicroservicesLogging using ELK Stack for Microservices
Logging using ELK Stack for MicroservicesVineet Sabharwal
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Modern Java web applications with Spring Boot and Thymeleaf
Modern Java web applications with Spring Boot and ThymeleafModern Java web applications with Spring Boot and Thymeleaf
Modern Java web applications with Spring Boot and ThymeleafLAY Leangsros
 
Java11 New Features
Java11 New FeaturesJava11 New Features
Java11 New FeaturesHaim Michael
 
From Zero to Spring Boot Hero with GitHub Codespaces
From Zero to Spring Boot Hero with GitHub CodespacesFrom Zero to Spring Boot Hero with GitHub Codespaces
From Zero to Spring Boot Hero with GitHub CodespacesVMware Tanzu
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger InternalsNorberto Leite
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyDaniel Bimschas
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 

Mais procurados (20)

Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache Jackrabbi
 
RESTful API - Best Practices
RESTful API - Best PracticesRESTful API - Best Practices
RESTful API - Best Practices
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Maven Introduction
Maven IntroductionMaven Introduction
Maven Introduction
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache Camel
 
Logging using ELK Stack for Microservices
Logging using ELK Stack for MicroservicesLogging using ELK Stack for Microservices
Logging using ELK Stack for Microservices
 
Spring Boot
Spring BootSpring Boot
Spring Boot
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Modern Java web applications with Spring Boot and Thymeleaf
Modern Java web applications with Spring Boot and ThymeleafModern Java web applications with Spring Boot and Thymeleaf
Modern Java web applications with Spring Boot and Thymeleaf
 
Java11 New Features
Java11 New FeaturesJava11 New Features
Java11 New Features
 
From Zero to Spring Boot Hero with GitHub Codespaces
From Zero to Spring Boot Hero with GitHub CodespacesFrom Zero to Spring Boot Hero with GitHub Codespaces
From Zero to Spring Boot Hero with GitHub Codespaces
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 

Destaque

The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6Jukka Zitting
 
Oak, the Architecture of the new Repository
Oak, the Architecture of the new RepositoryOak, the Architecture of the new Repository
Oak, the Architecture of the new RepositoryMichael Dürig
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oakMichael Dürig
 
The Zero Bullshit Architecture
The Zero Bullshit ArchitectureThe Zero Bullshit Architecture
The Zero Bullshit ArchitectureLars Trieloff
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache TikaJukka Zitting
 
Open source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorOpen source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorJukka Zitting
 
File System On Steroids
File System On SteroidsFile System On Steroids
File System On SteroidsJukka Zitting
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CIJukka Zitting
 
Java Content Repository avec Jackrabbit
Java Content Repository avec JackrabbitJava Content Repository avec Jackrabbit
Java Content Repository avec JackrabbitEmmanuel Hugonnet
 
Hosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRHosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRWoonsan Ko
 
Rapid JCR Applications Development with Sling
Rapid JCR Applications Development with SlingRapid JCR Applications Development with Sling
Rapid JCR Applications Development with SlingFelix Meschberger
 
恐るべきApache, Web勉強会@福岡
恐るべきApache, Web勉強会@福岡恐るべきApache, Web勉強会@福岡
恐るべきApache, Web勉強会@福岡Aya Komuro
 
Building Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGiBuilding Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGiCédric Hüsler
 
Demystifying Oak Search
Demystifying Oak SearchDemystifying Oak Search
Demystifying Oak SearchJustin Edelson
 
Into the TarPit: A TarMK Deep Dive
Into the TarPit: A TarMK Deep DiveInto the TarPit: A TarMK Deep Dive
Into the TarPit: A TarMK Deep DiveMichael Dürig
 
Shooting rabbits with sling
Shooting rabbits with slingShooting rabbits with sling
Shooting rabbits with slingTomasz Rękawek
 
Content Management Standards
Content Management StandardsContent Management Standards
Content Management StandardsDavid Nuescheler
 
Introducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management PlatformIntroducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management PlatformNuxeo
 

Destaque (20)

The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6
 
Oak, the Architecture of the new Repository
Oak, the Architecture of the new RepositoryOak, the Architecture of the new Repository
Oak, the Architecture of the new Repository
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oak
 
The Zero Bullshit Architecture
The Zero Bullshit ArchitectureThe Zero Bullshit Architecture
The Zero Bullshit Architecture
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache Tika
 
Open source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorOpen source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache Incubator
 
File System On Steroids
File System On SteroidsFile System On Steroids
File System On Steroids
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CI
 
Java Content Repository avec Jackrabbit
Java Content Repository avec JackrabbitJava Content Repository avec Jackrabbit
Java Content Repository avec Jackrabbit
 
Hosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRHosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCR
 
Apache Jackrabbit
Apache JackrabbitApache Jackrabbit
Apache Jackrabbit
 
Rapid JCR Applications Development with Sling
Rapid JCR Applications Development with SlingRapid JCR Applications Development with Sling
Rapid JCR Applications Development with Sling
 
恐るべきApache, Web勉強会@福岡
恐るべきApache, Web勉強会@福岡恐るべきApache, Web勉強会@福岡
恐るべきApache, Web勉強会@福岡
 
Apache Hive 紹介
Apache Hive 紹介Apache Hive 紹介
Apache Hive 紹介
 
Building Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGiBuilding Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGi
 
Demystifying Oak Search
Demystifying Oak SearchDemystifying Oak Search
Demystifying Oak Search
 
Into the TarPit: A TarMK Deep Dive
Into the TarPit: A TarMK Deep DiveInto the TarPit: A TarMK Deep Dive
Into the TarPit: A TarMK Deep Dive
 
Shooting rabbits with sling
Shooting rabbits with slingShooting rabbits with sling
Shooting rabbits with sling
 
Content Management Standards
Content Management StandardsContent Management Standards
Content Management Standards
 
Introducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management PlatformIntroducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management Platform
 

Semelhante a Content Storage With Apache Jackrabbit

Jackrabbit setup configuration
Jackrabbit setup configurationJackrabbit setup configuration
Jackrabbit setup configurationprabakaranbrick
 
Introduction to Apache Tomcat 7 Presentation
Introduction to Apache Tomcat 7 PresentationIntroduction to Apache Tomcat 7 Presentation
Introduction to Apache Tomcat 7 PresentationTomcat Expert
 
Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018CodeOps Technologies LLP
 
香港六合彩
香港六合彩香港六合彩
香港六合彩iewsxc
 
Rollin onj Rubyv3
Rollin onj Rubyv3Rollin onj Rubyv3
Rollin onj Rubyv3Oracle
 
Dynamic Languages Web Frameworks Indicthreads 2009
Dynamic Languages Web Frameworks Indicthreads 2009Dynamic Languages Web Frameworks Indicthreads 2009
Dynamic Languages Web Frameworks Indicthreads 2009Arun Gupta
 
The Java EE 6 platform
The Java EE 6 platformThe Java EE 6 platform
The Java EE 6 platformLorraine JUG
 
The Java Content Repository
The Java Content RepositoryThe Java Content Repository
The Java Content Repositorynobby
 
How to start using Scala
How to start using ScalaHow to start using Scala
How to start using ScalaNgoc Dao
 

Semelhante a Content Storage With Apache Jackrabbit (20)

Jackrabbit setup configuration
Jackrabbit setup configurationJackrabbit setup configuration
Jackrabbit setup configuration
 
Introduction to Apache Tomcat 7 Presentation
Introduction to Apache Tomcat 7 PresentationIntroduction to Apache Tomcat 7 Presentation
Introduction to Apache Tomcat 7 Presentation
 
GlassFish and JavaEE, Today and Future
GlassFish and JavaEE, Today and FutureGlassFish and JavaEE, Today and Future
GlassFish and JavaEE, Today and Future
 
Jira Rev002
Jira Rev002Jira Rev002
Jira Rev002
 
Hibernate
HibernateHibernate
Hibernate
 
bjhbj
bjhbjbjhbj
bjhbj
 
Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Java se7 features
Java se7 featuresJava se7 features
Java se7 features
 
Arquillian in a nutshell
Arquillian in a nutshellArquillian in a nutshell
Arquillian in a nutshell
 
Hibernate java and_oracle
Hibernate java and_oracleHibernate java and_oracle
Hibernate java and_oracle
 
.NET RDF APIs
.NET RDF APIs.NET RDF APIs
.NET RDF APIs
 
Rollin onj Rubyv3
Rollin onj Rubyv3Rollin onj Rubyv3
Rollin onj Rubyv3
 
Dynamic Languages Web Frameworks Indicthreads 2009
Dynamic Languages Web Frameworks Indicthreads 2009Dynamic Languages Web Frameworks Indicthreads 2009
Dynamic Languages Web Frameworks Indicthreads 2009
 
The Java EE 6 platform
The Java EE 6 platformThe Java EE 6 platform
The Java EE 6 platform
 
Java Cloud and Container Ready
Java Cloud and Container ReadyJava Cloud and Container Ready
Java Cloud and Container Ready
 
The Java Content Repository
The Java Content RepositoryThe Java Content Repository
The Java Content Repository
 
How to start using Scala
How to start using ScalaHow to start using Scala
How to start using Scala
 
Jsp and jstl
Jsp and jstlJsp and jstl
Jsp and jstl
 
Practical JRuby
Practical JRubyPractical JRuby
Practical JRuby
 

Mais de Jukka Zitting

MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStoreJukka Zitting
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 
Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Jukka Zitting
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repositoryJukka Zitting
 
Repository performance tuning
Repository performance tuningRepository performance tuning
Repository performance tuningJukka Zitting
 
The return of the hierarchical model
The return of the hierarchical modelThe return of the hierarchical model
The return of the hierarchical modelJukka Zitting
 
Text and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaText and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaJukka Zitting
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache TikaJukka Zitting
 
Design and architecture of Jackrabbit
Design and architecture of JackrabbitDesign and architecture of Jackrabbit
Design and architecture of JackrabbitJukka Zitting
 

Mais de Jukka Zitting (11)

MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStore
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 
Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repository
 
Repository performance tuning
Repository performance tuningRepository performance tuning
Repository performance tuning
 
The return of the hierarchical model
The return of the hierarchical modelThe return of the hierarchical model
The return of the hierarchical model
 
Text and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaText and metadata extraction with Apache Tika
Text and metadata extraction with Apache Tika
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache Tika
 
NoSQL Oakland
NoSQL OaklandNoSQL Oakland
NoSQL Oakland
 
Design and architecture of Jackrabbit
Design and architecture of JackrabbitDesign and architecture of Jackrabbit
Design and architecture of Jackrabbit
 
Apache Tika
Apache TikaApache Tika
Apache Tika
 

Último

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Content Storage With Apache Jackrabbit

  • 2. Introducing JCR Content Repository for Java™ Technology API -> Version 1.0 defined in JSR 170, final in 2006 -> Version 2.0 defined in JSR 283, final (hopefully) in 2009 -> Open source RI and TCK developed in Apache Jackrabbit Flexible, hierarchically ordered content store with features like full text search, versioning, transactions, observation, etc. Combines and extends features often found in file systems and databases; a "best of both worlds" approach.
  • 3. Introducing Apache Jackrabbit Entered Apache Incubator in 2004, graduated in 2006, now a TLP with 24 committers Version 1.0 (and JCR RI) in 2006, currently at 1.5.3. Version 2.0 (and JCR 2.0 RI) planned for 2009. Also: JCR Commons, Sling Components: jackrabbit-api jackrabbit-core jackrabbit-standalone jackrabbit-webapp jackrabbit-jca jackrabbit-ocm jackrabbit-jcr-rmi jackrabbit-webdav jackrabbit-jcr-server etc.
  • 4. Structure of a content repository Consists of one or more workspaces, one of which is the default workspace. A workspace consists of a tree of nodes, each of which can have zero or more child nodes. Each workspace has a single root node. Nodes have properties. Properties are typed (string, integer, date,  binary, reference, etc.) and can be either single- or multivalued.
  • 5. Structure, cont. Each node or property has a name, and can be accessed using a path. Names are namespaced. A referenceable node has a UUID, through which it can be accessed or referenced. Referential integrity is guaranteed. Each node has a primary node type and zero or more mixin types. Node types define the structure of a node. A node can also be unstructured.
  • 6. Content repository features Read/write Search (XPath and SQL) XML import/export Observation Versioning Node types Locking Access control Atomic changes XA transactions
  • 7. Getting started with Jackrabbit Download and run the standalone server: java -jar jackrabbit-standalone-1.5.3.jar -> Web interface at http://localhost:8080/ -> WebDAV at http://localhost:8080/repository/default -> JCR access over RMI at http://localhost:8080/rmi -> Repository data in ./jackrabbit -> Configuration in ./jackrabbit/repository.xml If you have a servlet container, use the webapp If you have a J2EE application server, use jca
  • 8. Remote access JCR-RMI layer available since Jackrabbit 0.9. Good functional coverage, not so good performance. -> administrative tools Look at clustering for better performance. Jackrabbit 1.6: spi2dav o.a.j.rmi.repository: new URLRemoteRepository(     "http://.../rmi"); new RMIRemoteRepository(     "//.../repository"); Classpath: jcr-1.0.jar jackrabbit-jcr-rmi-1.5.3.jar jackrabbit-api-1.5.3.jar (also in rmiregistry!)
  • 9. Jackrabbit as a shared resource JCA adapter in an application server -> accessed through JNDI Jackrabbit webapp in a servlet container -> JNDI or servlet context -> JNDI: complex setup -> cross-context access JNDI configuration with jackrabbit-servlet: <servlet>   <servlet-name>Repository</servlet-name>   <servlet-class>   org.apache.jackrabbit.servlet.JNDIRepositoryServlet   </servlet-class>   <init-param>     <param-name>location</param-name>     <param-value<javax/jcr/Repository</param-value>   </init-param> </servlet> Accessing the repository: public class MyServlet extends HttpServlet {     private final Repository repository =         new ServletRepository(this); }
  • 10. Jackrabbit in embedded mode RepositoryConfig config = RepositoryConfig.create(       "/path/to/repository", "/path/to/repository.xml"); RepositoryImpl repository = RepositoryImpl.create(config); try {     // Use the javax.jcr interfaces to access the repository } finally {     repository.shutdown(); }
  • 11. Embedded mode, cont. Only a single instance can be running at a time. For concurrent access: - multiple sessions - RMI for remote access - clustering Also: TransientRepository Maven coordinates <dependency>   <groupId>org.apache.jackrabbit</groupId>   <artifactId>jackrabbit-core</artifactId>   <version>1.5.3</version>   <exclusions>     <exclusion>       <groupId>commons-logging</groupId>       <artifactId>commons-logging</artifactId>     </exclusion>   </exclusions> </dependency> <dependency>   <groupId>org.slf4j</groupId>   <artifactId>slf4j-log4j12</artifactId>   <version>1.5.3</version> </dependency> <dependency>   <groupId>org.slf4j</groupId>   <artifactId>jcl-over-slf4j</artifactId>   <version>1.5.3</version> </dependency>
  • 12. Logging - what's this SLF4J thing? Jackrabbit uses the SLF4J logging facade for logging. Benefits: Great for embedded uses, can adapt to  Drawbacks: What do I put in my classpath?
  • 13. SLF4J in practice SLF4J API is automatically included as a dependency of jackrabbit-core. You need to explicitly add the SLF4J implementation. Jackrabit webapp, jca and standalone use slf4j-log4j, so you can use normal log4j configuration. Classpath with log4j: slf4j-api-1.5.3.jar slf4j-log4j-1.5.3.jar log4j-1.2.14.jar log4j.properties Classpath with no logging: slf4j-api-1.5.3.jar slf4j-nop-1.5.3.jar
  • 14. Repository configuration Configuration in a repository.xml file. Default configuration shipped with Jackrabbit. Structure defined in a DTD. Contains global settings and a workspace configuration template. The template is instantiated to a workspace.xml configuration file for each new workspace. Main elements: clustering, workspace, versioning, security, persistence, search index, data store, file system
  • 15. Persistence managers Use one of the bundle persistence managers. -> select one for your db Other persistence managers mostly for backwards compatibility. Default based on embedded Apache Derby. Configuration: driver,url,user,password schema schemaObjectPrefix minBlobSize bundleCacheSize consistencyCheck/Fix blockOnConnectionLoss Needs CREATE TABLE permissions!
  • 16. Search index Per-workspace search indexes based on Apache Lucene. By default everything is indexed. Use index configuration to customize. Clustering: Each cluster node maintains local search indexes. Configuration: min/maxMergeDocs mergeFactor analyzer textFilterClasses respectDocumentOrder resultFetchSize Performance: property, type, ft -> fast path -> slow
  • 17. Text extraction Full text indexing of file contents based on various parser libraries (POI, PDFBox, etc.). Currently only for the jcr:data property with correct jcr:mimeType. Jackrabbit 2.0: indexing of all binary properties. textFilterClasses: PlainTextExtractor MsWordTextExtractor MsExcelTextExtractor MsPowerPointTextExtractor PdfTextExtractor OpenOfficeTextExtractor RTFTextExtractor HTMLTextExtractor XMLTextExtractor
  • 18. Data store - dealing with lots of data Data store feature available since Jackrabbit 1.4 Content-addressed storage of large binary properties. Completely transparent to client applications. Uses garbage collection to remove unused data. Implementations: FileDataStore DbDataStore sandbox: S3DataStore Garbage collection: gc = si.createDSGC(); gc.scan(); gc.stopScan(); gc.deleteUnused();
  • 19. Content modeling: David's model 1. Data First. Structure Later. Maybe. 2.Drive the content hierarchy, don't let it happen. 3.Workspaces are for clone(), merge() and update() 4.Beware of Same Name Siblings 5.References considered harmful 6.Files are Files are Files 7.ID's are evil
  • 20. Content modeling: Example CREATE TABLE author (     id INTEGER,     name VARCHAR, ); CREATE TABLE post (     id INTEGER,     author INTEGER,     posted DATETIME,     title VARCHAR,     body TEXT ); /blog   /jukka     @name = Jukka Zitting     /2009       /03         /25           /hello             @author = jukka             @posted             @title             @body
  • 21. Example, cont. CREATE TABLE  comment (     id INTEGER,     post INTEGER,     title VARCHAR,     body TEXT ); CREATE TABLE media (     id INTEGER,     post INTEGER,     data BLOB,     caption VARCHAR ); /hello   /comments     /salut       @title = Salut!       @body   /media [nt:folder]     /image.jpg [nt:file]     /code [nt:folder]       /Example.java [nt:file]
  • 22. Example, cont. Versioning: /hello [mix:versionable]   @jcr:versionHistory   @jcr:baseVersion node.checkout(); // ... node.save(); node.checkin(); Locking: /hello [mix:lockable] node.lock(); // ... node.unlock();
  • 23. Common issues: Content hierarchy Jackrabbit doesn't support very flat content hierarchies. You'll start seeing problems when you put more than 10k child nodes under a single parent. Solution: Add more depth to your hierarchy. Divide entries by date, category, author, etc. If nothing else, use the first letter(s) of a title, a content hash, or even a random number to distribute the nodes. Note: You can still access the entries as a single flat set through search. The hierarchy is for browsing.
  • 24. Common issues: Concurrent edits Three ways to handle concurrent edits: 1. Merge changes 2. Fail conflicting changes  3. Block concurrent changes Jackrabbit does 1 by default, and falls back to 2 when merge fails. You can explicitly opt for 3 by using the JCR locking feature. Estimate: How often conflicts would happen? Will the benefits of locking be worth the overhead.