1. Learning for Biomedical Information Extraction with ILP Margherita Berardi Vincenzo Giuliano Donato Malerba
2.
3. What is “Information Extraction” Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION
4. What is “Information Extraction” Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte , a Microsoft VP . "That's a super-important shift for us in terms of code access.“ Richard Stallman , founder of the Free Software Foundation , countered saying… NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft.. IE
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15. … the learning strategy… Example: Parallel search for the predicates even and odd seeds even(0) odd(1) Simplest consistent clauses are found first, independently of the predicates to be learned
16. … the learning strategy… Example: Parallel search for the predicates even and odd seeds even(2) odd(1) A predicate dependency is discovered! even(X) succ ( Y,X ) even(X) succ( X , Y ) odd(X) succ(Y,X) odd(X) succ(X,Y) even(X) succ(Y,X), succ(Z,Y) odd(X) succ(Y,X), even(Y) odd(X) succ(Y,X), zero(Y) even(X) succ(X,Y), succ(Y,Z)
17.
18.
19.
20.
21.
22.
23. Textual portions of papers were categorized in five classes: Abstract, Introduction, Materials & Methods, Discussion and Results The abstract of each paper was processed Avg. No. of categories correctly classified
Firstly I’ll introduce peculiarities of SDM. They ‘re particularly interesting because the practice of geo-referencing them have caused a growing demand for powerful exploratory data analysis techniques overcomes classical statistical and data mining techniques and, among other things,support the analysis of socio economic phenomena by a spatial point of view. In this talk I’ll focus my attention on a specific task that is the discovery of spatial association rules For this purpose I’ll present ARES a system to extract association rules from census data and illustrate an application ARES to mine spatial association rules on North West England 1998 census data in order to study the mportality risk in Greater manchester county
What is IE. As a task it is… Starting with some text… and a empty data base with a defined ontology of fields and records, Use the information in the text to fill the database.
ML… although this is an area where ML has not yet trounced the hand-built systems. In some of the latest evaluations, hand-built shared 1 st place with a ML. Now many companies making a business from IE (from the Web): WasBang, Inxight, Intelliseek, ClearForest.
Data sparseness, robustness
CV i.e. it is divided into 5 folds (Four are used for training and one for testing in turn).
Initial ILP reasearch deals with concept learning in form of predicate definition learning
ATRE is a multiple-concept learning system, which solves the following problem:
Since the generation of a clause depends on the chosen seed, several seeds have to be chosen such that at least one seed per incomplete predicate definition is kept . Therefore, the search space is actually a forest of as many search-trees as the number of chosen seeds. The parallel exploration of the forest related to odd and even numbers. Spec. hierarchies are traversed top-dow. Search proceeds towards deeper and deeper levels of the specialization hierarchies until at least a user-defined number of consistent clauses is found. A supervisor task decides whether the search should carry on or not on the basis of the results returned by the concurrent tasks. When the search is stopped, the supervisor selects the “best” consistent clause according to the user’s preference criterion. This strategy has the advantage that simpler consistent clauses are found first, independently of the predicates to be learned. First learning step Consistent clauses in red
Second learning step
CV i.e. it is divided into 5 folds (Four are used for training and one for testing in turn).
If we guarantee the following two conditions: ……………………… then after a finite number of steps a theory T , which is complete and consistent, is built. If we denote by LHM( T i ) the least Herbrand model of a theory T i , the stepwise construction of theories entails that LHM( T i ) LHM( T i+1 ), for each i {0, 1, , n-1}, since the addition of a clause to a theory can only augment the LHM
In order to guarantee the first of the two conditions it is possible to proceed as follows. First, a positive example e + of a predicate p to be learned is selected, such that e + is not in LHM( T i ). The example e + is called seed . Then the space of definite clauses more general than e + is explored, looking for a clause C, if any, such that neg(LHM( T i { C })) = . In this way we guarantee that the second condition above holds as well. When found, C is added to T i giving T i+1 . If some positive examples are not included in LHM( T i+1 ) then a new seed is selected and the process is repeated. The second condition is more difficult to guarantee because of the non-monotonicity property. The approach followed in ATRE to remove inconsistency due to the addition of a clause to the theory consists of simple syntactic changes in the theory, which eventually creates new layers . The layering of a theory introduces a first variation of the classical separate-and-conquer strategy sketched above, since the addition of a locally consistent clause generated in the conquer stage is preceded by a global consistency check.
Learning multi-relational patterns from multi-relational data and background knowledge It allows to navigate the relational structure of data