NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
Rassenfosse - IProduct database of patent products pairs
1. IPRoduct:
A database of patent-product pairs
OECD Blue Sky III Forum
Ghent, Belgium, September 19–21, 2016
Gaétan de Rassenfosse
Assistant Professor in Innovation and IP Policy
Ecole polytechnique fédérale de Lausanne (EPFL), Switzerland
@gderasse
I acknowledge financial support from the U.S. NSF (SMA - 1645264)
2. We are not able to trace innovations to the marketplace
Innovations are intangibles and, hence, difficult to study.
Scholars have focused on patents (and scientific publications
and trademarks) as “output” of the innovation process.
But these are (i) abstract manifestations of science; (ii)
intermediate innovation output; (iii) not tied to market
outcome.
As a result, innovation studies have largely failed to study the
“real impact” of innovation. Exceptions exist, such as studies on
drug patents or case studies of particular technologies.
IPRoduct: A database of patent-product pairs 2
3. I want to observe innovations
at the point at which they reach consumers
4. The holy grail of innovation research
I propose to build a table that
links IPR data to product data.
Physical patent marking
means that it is difficult to
collect such information on a
large scale.
But companies can now use
virtual patent marking.
IPRoduct: A database of patent-product pairs
Allows them to collect damages for patent infringement with
respect to infringing activity that occurs before the infringer is
put on actual notice of the infringement.
5. The idea is simple: collect information available on line
5
6. But implementation is quite challenging
Several technical issues for collecting and structuring the data:
– Format ranging from straight HTML files listing all products
and the associated patents to non-OCR PDF files
– Information buried in the deep web (dynamic pages and
forms)
– Product lines rather than actual products
– etc.
But great technology is available. We are developing a software
to do the job. We use the Scala programming language,
developed by EPFL prof. Martin Odersky and used by tech.
companies such as LinkedIn, Foursquare for their big data
applications.
IPRoduct: A database of patent-product pairs 6
7. We are currently developing the crawler and the parser
IPRoduct: A database of patent-product pairs 7
2016/09 2017/09
Phase I
(own funds)
Phase II
(NSF + own funds)
Phase III
(looking for sponsor-s)
Objectives
Assessing feasibility
Software development
Small-scale database
Scoping
Objectives
Scaling up
UKIPO, EPO data
Trademark data
Price data
etc.
Objectives
Manual collection
Early assessment
Public release of
code and data
Data and code sharing policy
will depend on sponsors
8. Overview of the Phase I database
About 30 companies associated with 1,000 products and 3,000
U.S. patent documents.
In fields as diverse as cosmetics, food and beverages, computer
software, telecommunications, medical devices, consumer
goods, pharmaceuticals and building material.
IPRoduct: A database of patent-product pairs 8
A: “Human Necessities”
B: “Perform. Operations, Transporting”
C: “Chemistry, Metallurgy”
D: “Textiles, Paper”
E: “Fixed Constructions”
F: “Mech. Eng., Light., Heat., Weapons”
G: “Physics”
H: “Electricity”
10. 0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Numberofproducts
Number of patents
3 Median number of patents per product
(average of 7.74)
Number of patents per product
323 products are associated with
only 1 U.S. patent.
Can serve as a proxy for R&D
investments, complexity of
technology, evidence of patent
thickets, etc.
11. 0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Numberofpatents
Number of products
2 Median number of products per patent
(average of 3.43)
Number of products per patent
1320 patents are associated with
just one product.
Can be used to measure product
similarity, patent importance, etc.
12. 0%
2%
4%
6%
8%
10%
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Average age (in years)
13 Mean age of patents
associated with a product, in years
Proportion of products by average patent age
About 1% of products correspond
to patents filed on average less
than 3 years ago.
May be indicative of the speed of
technological obsolescence.
13. 9 Mean number of countries for which
patent protection was sought
Proportion of patents with a family member
in the jurisdiction
0%
20%
40%
60%
80%
100%
USPTO EPO JPO Triadic
45% of inventions are protected
at the USPTO, the EPO and the
JPO.
Witnesses the high economic
value of patents associated with a
product.
14. 9 Mean number of countries for which
patent protection was sought
15. 8 Number of years between the oldest and
the most recent patent in a product
Maximum time length between two patents
linked to the same product
About 40% of products have
patents filed in a time interval of
maximum 2,000 days (5.5 years).
Indicative of how long a company
has been developing the product,
and may inform about follow-on
innovation strategies.
16. 22%
Proportion of families linked to a product
11
5
7
4
2
20
43
15
300
122
74
56
44
169
191
750
2151
407
57
1382
267
446
38
392
3199
978
7569
3216
690
37077
19886
0%
10%
20%
30%
40%
50%
60%
70%
80%
Familieslinkedtoaproduct
Proportion of families linked to a product
Company #1 has 11 families, and
73% of these families are linked
to a product.
Families not linked to a product
may signal failed commerci-
alization attempts, FTO patents,
etc.
18. Budget justification is an important policy aspect
The IPRoduct database will provide a concrete manifestation of
S&I and the role of government funding, in a systematic manner.
Can also be used to identify case studies, in order to increase
public engagement with science.
IPRoduct: A database of patent-product pairs 18
Patents Papers Grants
Procurement
Universities
PROs
19. But policy relevance goes beyond budget justification
Provides a better understanding of how IP impacts the
economy.
Opens many research questions in many different disciplines
(e.g., economics, management, law, marketing, science of
science policy). For instance, it may provide new insights on
the private returns to patenting or the value of patents.
Novel industry-level S&T indicators
– Time to market: average patent age
– Patent density: average number of patents per product
Also directly serves one purpose of the America Invents Act,
namely to increase transparency of the patent system.
IPRoduct: A database of patent-product pairs 19
20. Thank you!
Gaétan de Rassenfosse
Ecole polytechnique fédérale de Lausanne
Email: gaetan.derassenfosse@epfl.ch
Web: http://www.gder.info
Twitter: @gderasse