DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia

•Transferir como ODP, PDF•

0 gostou•459 visualizações

Determining the semantic relatedness (i.e., the strength of a relation) of two resources in DBpedia (or other Linked Data sources) is a problem addressed by quite a few approaches in the recent past. However, there are no large-scale benchmark datasets for comparing such approaches, and it is an open problem to determine which of the approaches work better than others. Furthermore, larget-scale datasets for training machine learning based approaches are not available. DBpediaNYD is a large-scale synthetic silver standard benchmark dataset which contains symmetric and asymmetric similarity values, obtained using a web search engine.

Tecnologia Educação

DBpediaNYD –
A Silver Standard Benchmark Dataset
for Semantic Relatedness in DBpedia

10/22/13 Paulheim Heiko Paulheim
Heiko

1

Motivation
•

There are quite a few approaches to entity ranking/
statement weighting on Linked Data
– and DBpedia in particular

•

Examples:
– Franz et al. (2009) – Tensor Decomposition
– Meij et al. (2009) – Machine Learning
– Mirizzi et al. (2010) – Web Search Engines
– Mulay and Kumar (2011) – Machine Learning
– Hees et al. (2012) – Crowd Sourcing
– Nunes et al. (2012) – Social Network Analysis

10/22/13

Heiko Paulheim

2

Motivation
•

However,
– none of those have been competitively evaluated
– none of those have been evaluated at large scale

•

Evaluation with
– small private data sets
– user studies

•

Approaches using Machine Learning
– requires training data
– expensive to obtain

10/22/13

Heiko Paulheim

3

The Dataset
•

Large-scale dataset (several thousand instances)
– statements with strengths

•

Strength value: Normalized Google Distance

•

f(x): number of search results containing x

•

f(x,y): number of search results containing both x and y

•

M: number of pages in search engine index

•

NGD has been shown to correlate with human strength associations

10/22/13

Heiko Paulheim

4

The Dataset
•

NGD is a symmetric value
– NYD dataset also contains asymmetric values

•

Asymmetric Normalized Google Distance

•

f(x): number of search results containing x

•

f(x,y): number of search results containing both x and y

•

M: number of pages in search engine index

10/22/13

Heiko Paulheim

5

Constructing the Dataset
•

We sampled 10,000 statements
– with DBpedia resources as subject and object
(e.g., no type statements, no literals)
– with dbpedia or dbpprop predicate

•

...and computed symmetric/asymmetric NGD
– using the labels as search strings
– using Yahoo BOSS

10/22/13

Heiko Paulheim

6

The Dataset
•

Random sample of 10,000 statements
– i.e., 30,000 search engine calls (80c/1,000 → 24 USD)

•

3,058 pairs of resources had to be discarded
– f(x)<f(x,y) or f(y)<f(x,y)
– search engines sometimes don't count properly :-(

•

Result:
– 6,942 weighted statements (symmetric)
– 13,884 weighted statements (asymmetric)

10/22/13

Heiko Paulheim

7

The Dataset
•

Example:
– dbpedia:John_Lennon and dbpedia:Yoko_Ono

•

Distances:
– symmetric: 0.18
– John Lennon → Yoko Ono 0.18
– Yoko Ono → John Lennon 0.03

•

Explanation:
– Yoko Ono is famous for being John Lennon's wife
• and most often mentioned in that context
– John Lennon is more famous for being a member of the Beatles

10/22/13

Heiko Paulheim

8

Example: the DBpedia FindRelated Service
•

We trained two regression SVMs (LibSVM) based on DBpediaNYD
– one for symmetric, one for asymmetric
– service allows for finding the most related among the linked resources

•

Example results:

•

http://wiki.dbpedia.org/FindRelated

10/22/13

Heiko Paulheim

9

Conclusion and Outlook
•

DBpediaNYD allows for large scale evaluation
– rather a silver standard
– does not replace manually created gold standards

•

Future work
– validate DBpediaNYD with users
– compare search engines

10/22/13

Heiko Paulheim

10

Something Completely Different
•

Challenges enumerated in the workshop intro this morning
– “Logical inference on noisy data”

•

Talk on “Type Inference on Noisy RDF Data”
– Was actually applied for DBpedia 3.9
– Friday, 3:15, Bayside 204A

10/22/13

Heiko Paulheim

11

DBpediaNYD –
A Silver Standard Benchmark Dataset
for Semantic Relatedness in DBpedia

10/22/13 Paulheim Heiko Paulheim
Heiko

12

Mais conteúdo relacionado

Mais procurados

Similarity: Retrieving Documents

Learnbay Datascience

2019 03 05_biological_databases_part4_v_upload

Prof. Wim Van Criekinge

Connections that work: Linked Open Data demystified

Jakob .

Scholarly citations from one publication to another, expressed as reference lists within academic articles, are core elements of scholarly communication. Unfortunately, they usually can be accessed en masse only by paying significant subscription fees to commercial organizations, while those few services that do made them available for free impose strict limitations on their reuse. In this paper we provide an overview of the OpenCitations Project (http://opencitations.net) undertaken to remedy this situation, and of its main product, the OpenCitations Corpus, which is an open repository of accurate bibliographic citation data harvested from the scholarly literature, made available in RDF under a Creative Commons public domain dedication. Paper at: https://w3id.org/oc/paper/occ-lisc2016.html

Freedom for bibliographic references: OpenCitations arise

University of Bologna

PhyloTastic: names-based phyloinformatic data integration

Rutger Vos

Dbd arrrrcamp-2013

Peter Vandenabeele

Mais procurados (6)

Similarity: Retrieving Documents

2019 03 05_biological_databases_part4_v_upload

Connections that work: Linked Open Data demystified

Freedom for bibliographic references: OpenCitations arise

PhyloTastic: names-based phyloinformatic data integration

Dbd arrrrcamp-2013

Destaque

Using DBpedia for Thesaurus Management and Linked Open Data Integration

Martin Kaltenböck

Portails documentaires et référentiels du Web sémantique : exemples et enjeu...

Alexandre Monnin

Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...

ADBS

JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...

GUANGYUAN PIAO

Requêtes sparql

FipBast

Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...

ADBS

Lancement de Semanticpédia et DBpédia.fr

Fabien Gandon

Thérèse Libourel, atelier Ontologies avec Protégé

UMR 7324 CITERES - Laboratoire Archéologie et Territoires, Tours

Thérèse Libourel, Ontologies en SHS, 2015-11-09, Tours

UMR 7324 CITERES - Laboratoire Archéologie et Territoires, Tours

Destaque (9)

Using DBpedia for Thesaurus Management and Linked Open Data Integration

Portails documentaires et référentiels du Web sémantique : exemples et enjeu...

Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...

JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...

Requêtes sparql

Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...

Lancement de Semanticpédia et DBpédia.fr

Thérèse Libourel, atelier Ontologies avec Protégé

Thérèse Libourel, Ontologies en SHS, 2015-11-09, Tours

Semelhante a DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia

Data_Science.ppt

ANGADPRAJAPATI3

Where is my data (in the cloud) tamir dresher

Tamir Dresher

Where is my data (in the cloud) tamir dresher

Tamir Dresher

Where is my data (in the cloud) tamir dresher

Tamir Dresher

Elag workshop sessie 1 en 2 v10

Jeroen Rombouts

Research Data Management

Sarah Jones

Links between datasets are an essential ingredient of Linked Open Data. Since the manual creation of links is expensive at large-scale, link sets are often created using heuristics, which may lead to errors. In this paper, we propose an unsupervised approach for finding erroneous links. We represent each link as a feature vector in a higher dimensional vector space, and find wrong links by means of different multi-dimensional outlier detection methods. We show how the approach can be implemented in the RapidMiner platform using only off-the-shelf components, and present a first evaluation with real-world datasets from the Linked Open Data cloud showing promising results, with an F-measure of up to 0.54, and an area under the ROC curve of up to 0.86.

Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection

Heiko Paulheim

We provide real time big data training in Chennai by industrial experts with real time scenarios. Our Advanced topics will enhance the students expectations into high level knowledge in Big Data Technology. For More Info.Reach our Big Data Technical Team@ +91 96677211551/56 The Experience of Big data Training Experts Team. www.thecreatingexperts.com SAP BEST INSTITUTES IN CHENNAI http://www.youtube.com/watch?v=UpWthI0P-7g

Big Data Real Time Training in Chennai

Vijay Susheedran C G

Big Data 101 - An introduction

Neeraj Tewari

Ordering the chaos: Creating websites with imperfect data

Andy Stretton

Research Lifecycles and RDM

Marieke Guy

Quettra Design Problem Solution - Deepti Chafekar

quettra

Feature selection is an important preprocessing step in data mining, which has an impact on both the runtime and the result quality of the subsequent processing steps. While there are many cases where hierarchic relations between features exist, most existing feature selection approaches are not capable of exploiting those relations. In this paper, we introduce a method for feature selection in hierarchical feature spaces. The method first eliminates redundant features along paths in the hierarchy, and further prunes the resulting feature set based on the features' relevance. We show that our method yields a good trade-off between feature space compression and classification accuracy, and outperforms both standard approaches as well as other approaches which also exploit hierarchies.

DS2014: Feature selection in hierarchical feature spaces

Petar Ristoski

Over the last years, the Semantic Web has been growing steadily. Today, we count more than 10,000 datasets made available online following Semantic Web standards. Nevertheless, many applications, such as data integration, search, and interlinking, may not take the full advantage of the data without having a priori statistical information about its internal structure and coverage. In fact, there are already a number of tools, which offer such statistics, providing basic information about RDF datasets and vocabularies. However, those usually show severe deficiencies in terms of performance once the dataset size grows beyond the capabilities of a single machine. In this paper, we introduce a software component for statistical calculations of large RDF datasets, which scales out to clusters of machines. More specifically, we describe the first distributed inmemory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark. The preliminary results show that our distributed approach improves upon a previous centralized approach we compare against and provides approximately linear horizontal scale-up. The criteria are extensible beyond the 32 default criteria, is integrated into the larger SANSA framework and employed in at least four major usage scenarios beyond the SANSA community.

DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk

Gezim Sejdiu

week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt

RidoVercascade

This presentation is from the inaugural Atlanta Solr Meetup held on 2014/10/21 at Atlanta Tech Village. Description: CareerBuilder uses Solr to power their recommendation engine, semantic search, and data analytics products. They maintain an infrastructure of hundreds of Solr servers, holding over a billion documents and serving over a million queries an hour across thousands of unique search indexes. Come learn how CareerBuilder has integrated Solr into their technology platform (with assistance from Hadoop, Cassandra, and RabbitMQ) and walk through api and code examples to see how you can use Solr to implement your own real-time recommendation engine, semantic search, and data analytics solutions. Speaker: Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.

Scaling Recommendations, Semantic Search, & Data Analytics with solr

Trey Grainger

Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...

Lucidworks

Datamininglecture

Manish Rana

We describe an approach to find similarities between RDF datasets, which may be applicable to tasks such as link discovery, dataset summarization or dataset understanding. Our approach builds on the assumption that similar datasets should have a similar structure and include semantically similar resources and relationships. It is based on the combination of Frequent Subgraph Mining (FSM) techniques, used to synthesize the datasets and find similarities among them. The result of this work can be applied for easing the task of data interlinking and for promoting data reusing in the Semantic Web. Full paper at: http://memaldi.github.io/pdf/iesd2015.pdf

Detection of Related Semantic Datasets Based on Frequent Subgraph Mining

Mikel Emaldi Manrique

data mining

nehaanand123

Semelhante a DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia (20)

Data_Science.ppt

Where is my data (in the cloud) tamir dresher

Elag workshop sessie 1 en 2 v10

Research Data Management

Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection

Big Data Real Time Training in Chennai

Big Data 101 - An introduction

Ordering the chaos: Creating websites with imperfect data

Research Lifecycles and RDM

Quettra Design Problem Solution - Deepti Chafekar

DS2014: Feature selection in hierarchical feature spaces

DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk

week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt

Scaling Recommendations, Semantic Search, & Data Analytics with solr

Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...

Datamininglecture

Detection of Related Semantic Datasets Based on Frequent Subgraph Mining

data mining

Mais de Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...

Heiko Paulheim

Knowledge graph embeddings are a mechanism that projects each entity in a knowledge graph to a point in a continuous vector space. It is commonly assumed that those approaches project two entities closely to each other if they are similar and/or related. In this talk, I give a closer look at the roles of similarity and relatedness with respect to knowledge graph embeddings, and discuss how the well-known embedding mechanism RDF2vec can be tailored towards focusing on similarity, relatedness, or both.

What_do_Knowledge_Graph_Embeddings_Learn.pdf

Heiko Paulheim

New Adventures in RDF2vec

Heiko Paulheim

Using knowledge graphs in data mining typically requires a propositional, i.e., vector-shaped representation of entities. RDF2vec is an example for generating such vectors from knowledge graphs, relying on random walks for extracting pseudo-sentences from a graph, and utilizing word2vec for creating embedding vectors from those pseudo-sentences. In this talk, I will give insights into the idea of RDF2vec, possible application areas, and recently developed variants incorporating different walk strategies and training variations.

New Adventures in RDF2vec

Heiko Paulheim

Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems

Heiko Paulheim

From Wikis to Knowledge Graphs

Heiko Paulheim

Knowledge Graphs are often used as a symbolic representation mechanism for representing knowledge in data intensive applications, both for integrating corporate knowledge as well as for providing general, cross-domain knowledge in public knowledge graphs such as Wikidata. As such, they have been identified as a useful way of injecting background knowledge in data analysis processes. To fully harness the potential of knowledge graphs, latent representations of entities in the graphs, so called knowledge graph embeddings, show superior performance, but sacrifice one central advantage of knowledge graphs, i.e., the explicit symbolic knowledge representations. In this talk, I will shed some light on the usage of knowledge graphs and embeddings in data analysis, and give an outlook on research directions which aim at combining the best of both worlds.

Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...

Heiko Paulheim

Starting with Cyc in the 1980s, the collection of general knowledge in machine interpretable form has been considered a valuable ingredient in intelligent and knowledge intensive applications. Notable contributions in the field include the Wikipedia-based datasets DBpedia and YAGO, as well as the collaborative knowledge base Wikidata. Since Google has coined the term in 2012, they are most often referred to as knowledge graphs. Besides such open knowledge graphs, many companies have started using corporate knowledge graphs as a means of information representation. In this talk, I will look at two ongoing projects related to the extraction of knowledge graphs from Wikipedia and other Wikis. The first new dataset, CaLiGraph, aims at the generation of explicit formal definitions from categories, and the extraction of new instances from list pages. In its current release, CaLiGraph contains 200k axioms defining classes, and more than 7M typed instances. In the second part, I will look at the transfer of the DBpedia approach to a multitude of arbitrary Wikis. The first such prototype, DBkWik, extracts data from Fandom, a Wiki farm hosting more than 400k different Wikis on various topics. Unlike DBpedia, which relies on a larger user base for crowdsourcing an explicit schema and extraction rules, and the "one-page-per-entity" assumption, DBkWik has to address various challenges in the fields of schema learning and data integration. In its current release, DBkWik contains more than 11M entities, and has been found to be highly complementary to DBpedia.

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block

Heiko Paulheim

Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...

Heiko Paulheim

Machine Learning & Embeddings for Large Knowledge Graphs

Heiko Paulheim

From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph

Heiko Paulheim

Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...

Heiko Paulheim

The original Semantic Web vision foresees to describe entities in a way that the meaning can be interpreted both by machines and humans. Following that idea, large-scale knowledge graphs capturing a significant portion of knowledge have been developed. In the recent past, vector space embeddings of semantic web knowledge graphs - i.e., projections of a knowledge graph into a lower-dimensional, numerical feature space (a.k.a. latent feature space) - have been shown to yield superior performance in many tasks, including relation prediction, recommender systems, or the enrichment of predictive data mining tasks. At the same time, those projections describe an entity as a numerical vector, without any semantics attached to the dimensions. Thus, embeddings are as far from the original Semantic Web vision as can be. As a consequence, the results achieved with embeddings - as impressive as they are in terms of quantitative performance - are most often not interpretable, and it is hard to obtain a justification for a prediction, e.g., an explanation why an item has been suggested by a recommender system. In this paper, we make a claim for semantic embeddings and discuss possible ideas towards their construction.

Make Embeddings Semantic Again!

Heiko Paulheim

Knowledge graphs are used in various applications and have been widely analyzed. A question that is not very well researched is: what is the price of their production? In this paper, we propose ways to estimate the cost of those knowledge graphs. We show that the cost of manually curating a triple is between $2 and $6, and that the cost for automatically created knowledge graphs is by a factor of 15 to 150 cheaper (i.e., 1c to 15c per statement). Furthermore, we advocate for taking cost into account as an evaluation metric, showing the correspondence between cost per triple and semantic validity as an example.

How much is a Triple?

Heiko Paulheim

Large-scale cross-domain knowledge graphs, such as DBpedia or Wikidata, are some of the most popular and widely used datasets of the Semantic Web. In this paper, we introduce some of the most popular knowledge graphs on the Semantic Web. We discuss how machine learning is used to improve those knowledge graphs, and how they can be exploited as background knowledge in popular machine learning tasks, such as recommender systems.

Machine Learning with and for Semantic Web Knowledge Graphs

Heiko Paulheim

The problem of automatic detection of fake news insocial media, e.g., on Twitter, has recently drawn some attention. Although, from a technical perspective, it can be regarded as a straight-forward, binary classification problem, the major challenge is the collection of large enough training corpora, since manual annotation of tweets as fake or non-fake news is an expensive and tedious endeavor. In this paper, we discuss a weakly supervised approach, which automatically collects a large-scale, but very noisy training dataset comprising hundreds of thousands of tweets. During collection, we automatically label tweets by their source, i.e., trustworthy or untrustworthy source, and train a classifier on this dataset. We then use that classifier for a different classification target, i.e., the classification of fake and non-fake tweets. Although the labels are not accurate according to the new classification target (not all tweets by an untrustworthy source need to be fake news, and vice versa), we show that despite this unclean inaccurate dataset, it is possible to detect fake news with an F1 score of up to 0.9.

Weakly Supervised Learning for Fake News Detection on Twitter

Heiko Paulheim

Knowledge Graphs, such as DBpedia, YAGO, or Wikidata, are valuable resources for building intelligent applications like data analytics tools or recommender systems. Understanding what is in those knowledge graphs is a crucial prerequisite for selecing a Knowledge Graph for a task at hand. Hence, Knowledge Graph profiling - i.e., quantifying the structure and contents of knowledge graphs, as well as their differences - is essential for fully utilizing the power of Knowledge Graphs. In this paper, I will discuss methods for Knowledge Graph profiling, depict crucial differences of the big, well-known Knowledge Graphs, like DBpedia, YAGO, and Wikidata, and throw a glance at current developments of new, complementary Knowledge Graphs such as DBkWik and WebIsALOD.

Towards Knowledge Graph Profiling

Heiko Paulheim

Knowledge Graphs on the Web

Heiko Paulheim

DBpedia is a large-scale, cross-domain knowledge graph extracted from Wikipedia. For the extraction, crowd-sourced mappings from Wikipedia infoboxes to the DBpedia ontology are utilized. In this process, different problems may arise: users may create wrong and/or inconsistent mappings, use the ontology in an unforeseen way, or change the ontology without considering all possible consequences. In this paper, we present a data-driven approach to discover problems in mappings as well as in the ontology and its usage in a joint, data-driven process. We show both quantitative and qualitative results about the problems identified, and derive proposals for altering mappings and refactoring the DBpedia ontology.

Data-driven Joint Debugging of the DBpedia Mappings and Ontology

Heiko Paulheim

Ontology reasoning is typically a computationally intensive operation. While soundness and completeness of results is required in some use cases, for many others, a sensible trade-off between computation efforts and correctness of results makes more sense. In this paper, we show that it is possible to approximate a central task in reasoning, i.e., A-box consistency checking, by training a machine learning model which approximates the behavior of that reasoner for a specific ontology. On four different datasets, we show that such learned models constantly achieve an accuracy above 95% at less than 2% of the runtime of a reasoner, using a decision tree with no more than 20 inner nodes. For example, this allows for validating 293M Microdata documents against the schema.org ontology in less than 90 minutes, compared to 18 days required by a state of the art ontology reasoner.

Fast Approximate A-box Consistency Checking using Machine Learning

Heiko Paulheim

Mais de Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...

What_do_Knowledge_Graph_Embeddings_Learn.pdf

New Adventures in RDF2vec

Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems

From Wikis to Knowledge Graphs

Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block

Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...

Machine Learning & Embeddings for Large Knowledge Graphs

From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph

Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...

Make Embeddings Semantic Again!

How much is a Triple?

Machine Learning with and for Semantic Web Knowledge Graphs

Weakly Supervised Learning for Fake News Detection on Twitter

Towards Knowledge Graph Profiling

Knowledge Graphs on the Web

Data-driven Joint Debugging of the DBpedia Mappings and Ontology

Fast Approximate A-box Consistency Checking using Machine Learning

Último

AXA XL - Insurer Innovation Award Americas 2024

The Digital Insurer

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

MINDCTI Revenue Release Quarter One 2024

MIND CTI

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

Manulife - Insurer Transformation Award 2024

The Digital Insurer

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia

1. DBpediaNYD – A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia 10/22/13 Paulheim Heiko Paulheim Heiko 1

2. Motivation • There are quite a few approaches to entity ranking/ statement weighting on Linked Data – and DBpedia in particular • Examples: – Franz et al. (2009) – Tensor Decomposition – Meij et al. (2009) – Machine Learning – Mirizzi et al. (2010) – Web Search Engines – Mulay and Kumar (2011) – Machine Learning – Hees et al. (2012) – Crowd Sourcing – Nunes et al. (2012) – Social Network Analysis 10/22/13 Heiko Paulheim 2

3. Motivation • However, – none of those have been competitively evaluated – none of those have been evaluated at large scale • Evaluation with – small private data sets – user studies • Approaches using Machine Learning – requires training data – expensive to obtain 10/22/13 Heiko Paulheim 3

4. The Dataset • Large-scale dataset (several thousand instances) – statements with strengths • Strength value: Normalized Google Distance • f(x): number of search results containing x • f(x,y): number of search results containing both x and y • M: number of pages in search engine index • NGD has been shown to correlate with human strength associations 10/22/13 Heiko Paulheim 4

5. The Dataset • NGD is a symmetric value – NYD dataset also contains asymmetric values • Asymmetric Normalized Google Distance • f(x): number of search results containing x • f(x,y): number of search results containing both x and y • M: number of pages in search engine index 10/22/13 Heiko Paulheim 5

6. Constructing the Dataset • We sampled 10,000 statements – with DBpedia resources as subject and object (e.g., no type statements, no literals) – with dbpedia or dbpprop predicate • ...and computed symmetric/asymmetric NGD – using the labels as search strings – using Yahoo BOSS 10/22/13 Heiko Paulheim 6

7. The Dataset • Random sample of 10,000 statements – i.e., 30,000 search engine calls (80c/1,000 → 24 USD) • 3,058 pairs of resources had to be discarded – f(x)<f(x,y) or f(y)<f(x,y) – search engines sometimes don't count properly :-( • Result: – 6,942 weighted statements (symmetric) – 13,884 weighted statements (asymmetric) 10/22/13 Heiko Paulheim 7

8. The Dataset • Example: – dbpedia:John_Lennon and dbpedia:Yoko_Ono • Distances: – symmetric: 0.18 – John Lennon → Yoko Ono 0.18 – Yoko Ono → John Lennon 0.03 • Explanation: – Yoko Ono is famous for being John Lennon's wife • and most often mentioned in that context – John Lennon is more famous for being a member of the Beatles 10/22/13 Heiko Paulheim 8

9. Example: the DBpedia FindRelated Service • We trained two regression SVMs (LibSVM) based on DBpediaNYD – one for symmetric, one for asymmetric – service allows for finding the most related among the linked resources • Example results: • http://wiki.dbpedia.org/FindRelated 10/22/13 Heiko Paulheim 9

10. Conclusion and Outlook • DBpediaNYD allows for large scale evaluation – rather a silver standard – does not replace manually created gold standards • Future work – validate DBpediaNYD with users – compare search engines 10/22/13 Heiko Paulheim 10

11. Something Completely Different • Challenges enumerated in the workshop intro this morning – “Logical inference on noisy data” • Talk on “Type Inference on Noisy RDF Data” – Was actually applied for DBpedia 3.9 – Friday, 3:15, Bayside 204A 10/22/13 Heiko Paulheim 11

12. DBpediaNYD – A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia 10/22/13 Paulheim Heiko Paulheim Heiko 12

DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (6)

Destaque

Destaque (9)

Semelhante a DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia

Semelhante a DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia (20)

Mais de Heiko Paulheim

Mais de Heiko Paulheim (20)

Último

Último (20)

DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia