SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING
A Simple Message-Optimal Algorithm for Random Sampling from a Distributed Stream
Abstract - We present a simple, message-optimal algorithm for maintaining a random sample
from a large data stream whose input elements are distributed across multiple sites that
communicate via a central coordinator. At any point in time, the set of elements held by the
coordinator represent a uniform random sample from the set of all the elements observed so far.
When compared with prior work, our algorithms asymptotically improve the total number of
messages sent in the system. We present a matching lower bound, showing that our protocol
sends the optimal number of messages up to a constant factor with large probability. We also
consider the important case when the distribution of elements across different sites is non-
uniform, and show that for such inputs, our algorithm significantly outperforms prior solutions.
IEEE Transactions on Knowledge and Data Engineering (June 1 2016)
Online Learning from Trapezoidal Data Streams
Abstract - In this paper, we study a new problem of continuous learning from doubly-streaming
data where both data volume and feature space increase over time. We refer to the doubly-
streaming data as trapezoidal data streams and the corresponding learning problem as online
learning from trapezoidal data streams. The problem is challenging because both data volume
and data dimension increase over time, and existing online learning [1] [2], online feature
selection [3], and streaming feature selection algorithms [4] [5] are inapplicable. We propose a
new Online Learning with Streaming Features algorithm (OLSF for short) and its two variants
that combine online learning [1] [2] and streaming feature selection [4] [5] to enable learning
from trapezoidal data streams with infinite training instances and features. Specifically, when a
new training instance carrying new features arrives, a classifier updates the existing features by
following the passive-aggressive update rule [2] and updates the new features by following the
structural risk minimization principle. Then, feature sparsity is introduced by using the projected
truncation technique. We derive performance bounds of the OLSF algorithm and its variants. We
also conduct experiments on real-world data sets to show the performance of the proposed
algorithms.
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
IEEE Transactions on Knowledge and Data Engineering (May 2016)
Quality-Aware Subgraph Matching Over Inconsistent Probabilistic Graph Databases
Abstract - Resource Description Framework (RDF) has been widely used in the Semantic Web to
describe resources and their relationships. The RDF graph is one of the most commonly used
representations for RDF data. However, in many real applications such as the data
extraction/integration, RDF graphs integrated from different data sources may often contain
uncertain and inconsistent information (e.g., uncertain labels or that violate facts/rules), due to
the unreliability of data sources. In this paper, we formalize the RDF data by inconsistent
probabilistic RDF graphs, which contain both inconsistencies and uncertainty. With such a
probabilistic graph model, we focus on an important problem, quality-aware subgraph matching
over inconsistent probabilistic RDF graphs (QA-gMatch), which retrieves subgraphs from
inconsistent probabilistic RDF graphs that are isomorphic to a given query graph and with high
quality scores (considering both consistency and uncertainty). In order to efficiently answer QA-
gMatch queries, we provide two effective pruning methods, namely adaptive label pruning and
quality score pruning, which can greatly filter out false alarms of subgraphs. We also design an
effective index to facilitate our proposed pruning methods, and propose an efficient approach for
processing QA-gMatch queries. Finally, we demonstrate the efficiency and effectiveness of our
proposed approaches through extensive experiments.
IEEE Transactions on Knowledge and Data Engineering (June 1 2016)
CavSimBase: A Database for Large Scale Comparison of Protein Binding Sites
Abstract - CavBase is a database containing information about the three-dimensional geometry
and the physicochemical properties of putative protein binding sites. Analyzing CavBase data
typically involves computing the similarity of pairs of binding sites. In contrast to sequence
alignment, however, a structural comparison of protein binding sites is a computationally
challenging problem, making large scale studies difficult or even infeasible. One possibility to
overcome this obstacle is to precompute pairwise similarities in an all-against-all comparison,
and to make these similarities subsequently accessible to data analysis methods. Pairwise
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
similarities, once being computed, can also be used to equip CavBase with a neighborhood
structure. Taking advantage of this structure, methods for problems such as similarity retrieval
can be implemented efficiently. In this paper, we tackle the problem of performing an all-
against-all comparison using CavBase, consisting of more than 200,000 protein cavities, by
means of parallel computation and cloud computing techniques. We present the conceptual
design and technical realization of a large-scale study to create a similarity database called
CavSimBase. We illustrate how CavSimBase is constructed, is accessed, and is used to answer
biological questions by data analysis and similarity retrieval.
IEEE Transactions on Knowledge and Data Engineering (June 1 2016)
Online Subgraph Skyline Analysis over Knowledge Graphs
Abstract - Subgraph search is very useful in many real-world applications. However, users may
be overwhelmed by the masses of matches. In this paper, we propose a subgraph skyline analysis
problem, denoted as $S^2A$, to support more complicated analysis over graph data.
Specifically, given a large graph $G$ and a query graph $q$, we want to find all the subgraphs
$g$ in $G$, such that $g$ is graph isomorphic to $q$ and not dominated by any other subgraphs.
In order to improve the efficiency, we devise a hybrid feature encoding incorporating both
structural and numeric features based on a partitioning strategy, and discuss how to optimize the
space partitioning. We also present a skylayer index to facilitate the dynamic subgraph skyline
computation. Moreover, an attribute cluster-based - ethod is proposed to deal with the curse of
dimensionality. Extensive experiments over real datasets confirm the effectiveness and
efficiency of our algorithm.
IEEE Transactions on Knowledge and Data Engineering (July 1 2016)
K Nearest Neighbour Joins for Big Data on MapReduce: a Theoretical and Experimental
Analysis
Abstract - Given a point p and a set of points S, the kNN operation finds the k closest points to p
in S. It is a computational intensive task with a large range of applications such as knowledge
discovery or data mining. However, as the volume and the dimension of data increase, only
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
distributed approaches can perform such costly operation in a reasonable time. Recent works
have focused on implementing efficient solutions using the MapReduce programming model
because it is suitable for distributed large scale data processing. Although these works provide
different solutions to the same problem, each one has particular constraints and properties. In this
paper, we compare the different existing approaches for computing kNN on MapReduce, first
theoretically, and then by performing an extensive experimental evaluation. To be able to
compare solutions, we identify three generic steps for kNN computation on MapReduce: data
pre-processing, data partitioning and computation. We then analyze each step from load
balancing, accuracy and complexity aspects. Experiments in this paper use a variety of datasets,
and analyze the impact of data volume, data dimension and the value of k from many
perspectives like time and space complexity, and accuracy. The experimental part brings new
advantages and shortcomings that are discussed for each algorithm. To the best of our
knowledge, this is the first paper that compares kNN computing methods on MapReduce both
theoretically and experimentally with the same setting. Overall, this paper can be used as a guide
to tackle kNN-based practical problems in the context of big data.
IEEE Transactions on Knowledge and Data Engineering (May 2016)
ATD: Anomalous Topic Discovery in High Dimensional Discrete Data
Abstract - We propose an algorithm for detecting patterns exhibited by anomalous clusters in
high dimensional discrete data. Unlike most anomaly detection (AD) methods, which detect
individual anomalies, our proposed method detects groups (clusters) of anomalies; i.e. sets of
points which collectively exhibit abnormal patterns. In many applications this can lead to better
understanding of the nature of the atypical behavior and to identifying the sources of the
anomalies. Moreover, we consider the case where the atypical patterns exhibit on only a small
(salient) subset of the very high dimensional feature space. Individual AD techniques and
techniques that detect anomalies using all the features typically fail to detect such anomalies, but
our method can detect such instances collectively, discover the shared anomalous patterns
exhibited by them, and identify the subsets of salient features. In this paper, we focus on
detecting anomalous topics in a batch of text documents, developing our algorithm based on
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
topic models. Results of our experiments show that our method can accurately detect anomalous
topics and salient features (words) under each such topic in a synthetic data set and two real-
world text corpora and achieves better performance compared to both standard group AD and
individual AD techniques. All required code to reproduce our experiments is available from
https://github.com/hsoleimani/ATD.
IEEE Transactions on Knowledge and Data Engineering (May 2016)
Multilabel Classification via Co-evolutionary Multilabel Hypernetwork
Abstract - Multilabel classification is prevalent in many real-world applications where data
instances may be associated with multiple labels simultaneously. In multilabel classification,
exploiting label correlations is an essential but nontrivial task. Most of the existing multilabel
learning algorithms are either ineffective or computational demanding and less scalable in
exploiting label correlations. In this paper, we propose a co-evolutionary multilabel
hypernetwork (Co-MLHN) as an attempt to exploit label correlations in an effective and efficient
way. To this end, we firstly convert the traditional hypernetwork into a multilabel hypernetwork
(MLHN) where label correlations are explicitly represented. We then propose a co-evolutionary
learning algorithm to learn an integrated classification model for all labels. The proposed Co-
MLHN exploits arbitrary order label correlations and has linear computational complexity with
respect to the number of labels. Empirical studies on a broad range of multilabel data sets
demonstrate that Co-MLHN achieves competitive results against state-of-the-art multilabel
learning algorithms, in terms of both classification performance and scalability with respect to
the number of labels.
IEEE Transactions on Knowledge and Data Engineering (May 2016)
Learning to Find Topic Experts in Twitter via Different Relations
Abstract - Expert finding has become a hot topic along with the flourishing of social networks,
such as micro-blogging services like Twitter. Finding experts in Twitter is an important problem
because tweets from experts are valuable sources that carry rich information (e.g., trends) in
various domains. However, previous methods cannot be directly applied to Twitter expert
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
finding problem. Recently, several attempts use the relations among users and Twitter Lists for
expert finding. Nevertheless, these approaches only partially utilize such relations. To this end,
we develop a probabilistic method to jointly exploit three types of relations (i.e., follower
relation, user-list relation, and list-list relation) for finding experts. Specifically, we propose a
Semi-Supervised Graph-based Ranking approach ($sf{SSGR}$) to offline calculate the global
authority of users. In $sf{SSGR}$, we employ a normalized Laplacian regularization term to
jointly explore the three relations, which is subject to the supervised information derived from
Twitter crowds. We then online compute the local relevance between users and the given query.
By leveraging the global authority and local relevance of users, we rank all of users and find top-
N users with highest ranking scores. Experiments on real-world data demonstrate the
effectiveness of our proposed approach for topic-specific expert finding in Twitt- r.
IEEE Transactions on Knowledge and Data Engineering (July 2016)
Analytic Queries over Geospatial Time-Series Data Using Distributed Hash Tables
Abstract - As remote sensing equipment and networked observational devices continue to
proliferate, their corresponding data volumes have surpassed the storage and processing
capabilities of commodity computing hardware. This trend has led to the development of
distributed storage frameworks that incrementally scale out by assimilating resources as
necessary. While challenging in its own right, storing and managing voluminous datasets is only
the precursor to a broader field of research: extracting insights, relationships, and models from
the underlying datasets. The focus of this study is twofold: exploratory and predictive analytics
over voluminous, multidimensional datasets in a distributed environment. Both of these types of
analysis represent a higher-level abstraction over standard query semantics; rather than indexing
every discrete value for subsequent retrieval, our framework autonomously learns the
relationships and interactions between dimensions in the dataset and makes the information
readily available to users. This functionality includes statistical synopses, correlation analysis,
hypothesis testing, probabilistic structures, and predictive models that not only enable the
discovery of nuanced relationships between dimensions, but also allow future events and trends
to be predicted. The algorithms presented in this work were evaluated empirically on a real-
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
world geospatial time-series dataset in a production environment, and are broadly applicable
across other storage frameworks.
IEEE Transactions on Knowledge and Data Engineering (June 2016)
RSkNN: kNN Search on Road Networks by Incorporating Social Influence
Abstract - Although NN search on a road network, i.e., finding nearest objects to a query user on,
has been extensively studied, existing works neglected the fact that the 's social information can
play an important role in this NN query. Many real-world applications, such as location-based
social networking services, require such a query. In this paper, we study a new problem: NN
search on road networks by incorporating social influence (RSkNN). Specifically, the state-of-
the-art Independent Cascade (IC) model in social network is applied to de- ine social influence.
One critical challenge of the problem is to speed up the computation of the social influence over
large road and social networks. To address this challenge, we propose three efficient index-based
search algorithms, i.e., road network-based (RN-based), social network-based (SN-based), and
hybrid indexing algorithms. In the RN-based algorithm, we employ a filtering-and-verification
framework for tackling the hard problem of computing social influence. In the SN-based
algorithm, we embed social cuts into the index, so that we speed up the query. In the hybrid
algorithm, we propose an index, summarizing the road and social networks, based on which we
can obtain query answers efficiently. Finally, we use real road and social network data to
empirically verify the efficiency and efficacy of our solutions.
IEEE Transactions on Knowledge and Data Engineering (June 2016)
Unsupervised Visual Hashing with Semantic Assistant for Content-based Image Retrieval
Abstract - As an emerging technology to support scalable content-based image retrieval (CBIR),
hashing has been recently received great attention and became a very active research domain. In
this study, we propose a novel unsupervised visual hashing approach called semantic-assisted
visual hashing (SAVH). Distinguished from semi-supervised and supervised visual hashing, its
core idea is to effectively extract the rich semantics latently embedded in auxiliary texts of
images to boost the effectiveness of visual hashing without any explicit semantic labels. To
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
achieve the target, a unified unsupervised framework is developed to learn hash codes by
simultaneously preserving visual similarities of images, integrating the semantic assistance from
auxiliary texts on modeling high-order relationships of inter-images and characterizing the
correlations between images and shared topics. Our performance study on three publicly
available image collections: Wiki, MIR Flickr, and NUS-WIDE indicates that SAVH can
achieve superior performance over several state-of-the-art techniques.
IEEE Transactions on Knowledge and Data Engineering (May 2016)
A Scalable Data Chunk Similarity based Compression Approach for Efficient Big Sensing
Data Processing on Cloud
Abstract - Big sensing data is prevalent in both industry and scientific research applications
where the data is generated with high volume and velocity. Cloud computing provides a
promising platform for big sensing data processing and storage as it provides a flexible stack of
massive computing, storage, and software services in a scalable manner. Current big sensing data
processing on Cloud have adopted some data compression techniques. However, due to the high
volume and velocity of big sensing data, traditional data compression techniques lack sufficient
efficiency and scalability for data processing. Based on specific on-Cloud data compression
requirements, we propose a novel scalable data compression approach based on calculating
similarity among the partitioned data chunks. Instead of compressing basic data units, the
compression will be conducted over partitioned data chunks. To restore original data sets, some
restoration functions and predictions will be designed. MapReduce is used for algorithm
implementation to achieve extra scalability on Cloud. With real world meteorological big
sensing data experiments on U-Cloud platform, we demonstrate that the proposed scalable
compression approach based on data chunk similarity can significantly improve data
compression efficiency with affordable data accuracy loss.
IEEE Transactions on Knowledge and Data Engineering (February 2016)
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
Network Motif Discovery: A GPU Approach
Abstract - The identification of network motifs has important applications in numerous domains,
such as pattern detection in biological networks and graph analysis in digital circuits. However,
mining network motifs is computationally challenging, as it requires enumerating subgraphs
from a real-life graph, and computing the frequency of each subgraph in a large number of
random graphs. In particular, existing solutions often require days to derive network motifs from
biological networks with only a few thousand vertices. To address this problem, this paper
presents a novel study on network motif discovery using Graphical Processing Units (GPUs).
The basic idea is to employ GPUs to parallelize a large number of subgraph matching tasks in
computing subgraph frequencies from random graphs, so as to reduce the overall computation
time of network motif discovery. We explore the design space of GPU-based subgraph matching
algorithms, with careful analysis of several crucial factors (such as branch divergences and
memory coalescing) that affect the performance of GPU programs. Based on our analysis, we
develop a GPU-based solution that (i) considerably differs from existing CPU-based methods in
how it enumerates subgraphs, and (ii) exploits the strengths of GPUs in terms of parallelism
while mitigating their limitations in terms of the computation power per GPU core. With
extensive experiments on a variety of biological networks, we show that our solution is up to two
orders of magnitude faster than the best CPU-based approach, and is around 20 times more cost-
effective than the latter, when taking into account the monetary costs of the CPU and GPUs used.
IEEE Transactions on Knowledge and Data Engineering (May 2016)
Crowdsourced Data Management: A Survey
Abstract - Some important data management and analytics tasks cannot be completely addressed
by automated processes. These “computer-hard” tasks such as entity resolution, sentiment
analysis, and image recognition, can be enhanced through the use of human cognitive ability.
Human Computation is an effective way to address such tasks by harnessing the capabilities of
crowd workers (i.e., the crowd). Thus, crowdsourced data management has become an area of
increasing interest in research and industry. There are three important problems in crowdsourced
data management. (1) Quality Control: Workers may return noisy results and effective
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
techniques are required to achieve high quality; (2) Cost Control: The crowd is not free, and cost
control aims to reduce the monetary cost; (3) Latency Control: The human workers can be slow,
particularly in contrast to computing time scales, so latency-control techniques are required.
There has been significant work addressing these three factors for designing crowdsourced tasks,
developing crowdsourced data manipulation operators, and optimizing plans of multiple
operators. In this paper, we survey and synthesize a wide spectrum of existing studies on
crowdsourced data management. Based on this analysis we then outline key factors that need to
be considered to improve crowdsourced data management.
IEEE Transactions on Knowledge and Data Engineering (February 2016)
Resolving Multi-Party Privacy Conflicts in Social Media
Abstract - Items shared through Social Media may affect more than one user's privacy—e.g.,
photos that depict multiple users, comments that mention multiple users, events in which
multiple users are invited, etc. The lack of multi-party privacy management support in current
mainstream Social Media infrastructures makes users unable to appropriately control to whom
these items are actually shared or not. Computational mechanisms that are able to merge the
privacy preferences of multiple users into a single policy for an item can help solve this problem.
However, merging multiple users’ privacy preferences is not an easy task, because privacy
preferences may conflict, so methods to resolve conflicts are needed. Moreover, these methods
need to consider how users’ would actually reach an agreement about a solution to the conflict in
order to propose solutions that can be acceptable by all of the users affected by the item to be
shared. Current approaches are either too demanding or only consider fixed ways of aggregating
privacy preferences. In this paper, we propose the first computational mechanism to resolve
conflicts for multi-party privacy management in Social Media that is able to adapt to different
situations by modelling the concessions that users make to reach a solution to the conflicts. We
also present results of a user study in which our proposed mechanism outperformed other
existing approaches in terms of how many times each approach matched users’ behaviour.
IEEE Transactions on Knowledge and Data Engineering (July 2016)
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
Interactive Visualization of Large Data Sets
Abstract - Visualization provides a powerful means for data analysis. But to be practical, visual
analytics tools must support smooth and flexible use of visualizations at a fast rate. This becomes
increasingly onerous with the ever-increasing size of real-world datasets. First, large databases
make interaction more difficult once query response time exceeds several seconds. Second, any
attempt to show all data points will overload the visualization, resulting in chaos that will only
confuse the user. Over the last few years substantial effort has been put into addressing both of
these issues and many innovative solutions have been proposed. Indeed, data visualization is a
topic that is too large to be addressed in a single survey paper. Thus, we restrict our attention
here to interactive visualization of large data sets. Our focus then is skewed in a natural way
towards query processing problem - provided by an underlying database system - rather than to
the actual data visualization problem.
IEEE Transactions on Knowledge and Data Engineering (April 2016)
Improving Construction of Conditional Probability Tables for Ranked Nodes in Bayesian
Networks
Abstract - This paper elaborates on the ranked nodes method (RNM) that is used for constructing
conditional probability tables (CPTs) for Bayesian networks consisting of a class of nodes called
ranked nodes. Such nodes typically represent continuous quantities that lack well-established
interval scales and are hence expressed by ordinal scales. Based on expert elicitation, the CPT of
a child node is generated in RNM by aggregating weighted states of parent nodes with a weight
expression. RNM is also applied to nodes that are expressed by interval scales. However, the use
of the method in this way may be ineffective due to challenges which are not addressed in the
existing literature but are demonstrated through an illustrative example in this paper. To
overcome the challenges, the paper introduces a novel approach that facilitates the use of RNM.
It consists of guidelines concerning the discretization of the interval scales into ordinal ones and
the determination of a weight expression and weights based on assessments of the expert about
the mode of the child node. The determination is premised on interpretations and feasibility
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
conditions of the weights derived in the paper. The utilization of the approach is demonstrated
with the illustrative example throughout the paper.
IEEE Transactions on Knowledge and Data Engineering (July 2016)
Clearing Contamination in Large Networks
Abstract - In this work, we study the problem of clearing contamination spreading through a
large network where we model the problem as a graph searching game. The problem can be
summarized as constructing a search strategy that will leave the graph clear of any contamination
at the end of the searching process in as few steps as possible. We show that this problem is NP-
hard even on directed acyclic graphs and provide an efficient approximation algorithm. We
experimentally observe the performance of our approximation algorithm in relation to the lower
bound on several large online networks including Slashdot, Epinions, and Twitter.
IEEE Transactions on Knowledge and Data Engineering (June 2016)
Private Over-threshold Aggregation Protocols over Distributed Databases
Abstract - In this paper, we revisit the private over-threshold data aggregation problem.We
formally define the problem’s security requirements as both data and user privacy goals. To
achieve both goals, and to strike a balance between efficiency and functionality, we devise an
efficient cryptographic construction and its proxy-based variant. Both schemes are provably
secure in the semi-honest model. Our key idea for the constructions and their malicious variants
is to compose two encryption functions tightly coupled in a way that the two functions are
commutative and one public-key encryption has an additive homomorphism. We call that double
encryption.We analyze the computational and communication complexities of our construction,
and show that it is much more efficient than the existing protocols in the literature. Specifically,
our protocol has linear complexity in computation and communication with respect to the
number of users. Its round complexity is also linear in the number of users. Finally, we show that
our basic protocol is efficiently transformed into a stronger protocol secure in the presence of
malicious adversaries, and provide the resulting protocol’s performance and security analysis.
IEEE Transactions on Knowledge and Data Engineering (May 2016)
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
Challenges in Data Crowdsourcing
Abstract - Crowdsourcing refers to solving large problems by involving human workers that
solve component sub-problems or tasks. In data crowdsourcing, the problem involves data
acquisition, management, and analysis. In this paper, we provide an overview of data
crowdsourcing, giving examples of problems that the authors have tackled, and presenting the
key design steps involved in implementing a crowdsourced solution. We also discuss some of the
open challenges that remain to be solved.
IEEE Transactions on Knowledge and Data Engineering (April 2016)
Efficient R-Tree Based Indexing Scheme for Server-Centric Cloud Storage System
Abstract - Cloud storage system poses new challenges to the community to support efficient
concurrent querying tasks for various data-intensive applications, where indices always hold
important positions. In this paper, we explore a practical method to construct a two-layer
indexing scheme for multi-dimensional data in diverse server-centric cloud storage system. We
first propose RT-HCN, an indexing scheme integrating R-tree based indexing structure and
HCN-based routing protocol. RT-HCN organizes storage and compute nodes into an HCN
overlay, one of the newly proposed sever-centric data center topologies. Based on the properties
of HCN, we design a specific index mapping technique to maintain layered global indices and
corresponding query processing algorithms to support efficient query tasks. Then, we expand the
idea of RT-HCN onto another server-centric data center topology DCell, discovering a potential
generalized and feasible way of deploying two-layer indexing schemes on other server-centric
networks. Furthermore, we prove theoretically that RT-HCN is both space-efficient and query-
efficient, by which each node actually maintains a tolerable number of global indices while high
concurrent queries can be processed within accepted overhead. We finally conduct targeted
experiments on Amazon's EC2 platforms, comparing our design with RT-CAN, a similar
indexing scheme for traditional P2P network. The results validate the query efficiency, especially
the speedup of point query of RT-HCN, depicting its potential applicability in future data
centers.
IEEE Transactions on Knowledge and Data Engineering (June 2016)
For Details, Contact TSYS Academic Projects.
Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/
Mail Id: tsysglobalsolutions2014@gmail.com.
SUPPORT OFFERED TO REGISTERED STUDENTS:
1. IEEE Base paper.
2. Review material as per individuals’ university guidelines
3. Future Enhancement
4. assist in answering all critical questions
5. Training on programming language
6. Complete Source Code.
7. Final Report / Document
8. International Conference / International Journal Publication on your Project.
FOLLOW US ON FACEBOOK @ TSYS Academic Projects

Mais conteúdo relacionado

Mais procurados

Clustering big spatiotemporal interval data
Clustering big spatiotemporal interval dataClustering big spatiotemporal interval data
Clustering big spatiotemporal interval dataNexgen Technology
 
Event-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-CommerceEvent-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-Commerceijtsrd
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics OverviewTony Fast
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesIRJET Journal
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graphijdms
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data setsIjripublishers Ijri
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Geoffrey Fox
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
 
Recent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A ReviewRecent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A ReviewIOSRjournaljce
 
A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...aimsnist
 
10 Algorithms in data mining
10 Algorithms in data mining10 Algorithms in data mining
10 Algorithms in data miningGeorge Ang
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithmsFarhan Zaki
 
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Anubhav Jain
 
MultiObjective(11) - Copy
MultiObjective(11) - CopyMultiObjective(11) - Copy
MultiObjective(11) - CopyAMIT KUMAR
 
data Fusion and log correlation
data Fusion and log correlationdata Fusion and log correlation
data Fusion and log correlationMahdi Sayyad
 

Mais procurados (20)

Clustering big spatiotemporal interval data
Clustering big spatiotemporal interval dataClustering big spatiotemporal interval data
Clustering big spatiotemporal interval data
 
Event-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-CommerceEvent-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-Commerce
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics Overview
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graph
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
Recent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A ReviewRecent Trends in Incremental Clustering: A Review
Recent Trends in Incremental Clustering: A Review
 
A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...
 
10 Algorithms in data mining
10 Algorithms in data mining10 Algorithms in data mining
10 Algorithms in data mining
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
 
7
77
7
 
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
MultiObjective(11) - Copy
MultiObjective(11) - CopyMultiObjective(11) - Copy
MultiObjective(11) - Copy
 
final seminar
final seminarfinal seminar
final seminar
 
data Fusion and log correlation
data Fusion and log correlationdata Fusion and log correlation
data Fusion and log correlation
 

Semelhante a IEEE Datamining 2016 Title and Abstract

IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstracttsysglobalsolutions
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot netredpel dot com
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Mumbai Academisc
 
AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...
AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...
AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...Nexgen Technology
 
Performance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of DocumentsPerformance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of DocumentsIRJET Journal
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 
IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016 IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016 tsysglobalsolutions
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streamsirjes
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
 
Ieee transactions on networking 2018 Title with Abstract
Ieee transactions on networking 2018 Title with Abstract Ieee transactions on networking 2018 Title with Abstract
Ieee transactions on networking 2018 Title with Abstract tsysglobalsolutions
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLAnubhav Jain
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
 
Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...tsysglobalsolutions
 
factorization methods
factorization methodsfactorization methods
factorization methodsShaina Raza
 
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and AbstractIEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and Abstracttsysglobalsolutions
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
 
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...IJERA Editor
 

Semelhante a IEEE Datamining 2016 Title and Abstract (20)

IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
 
AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...
AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...
AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...
 
Performance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of DocumentsPerformance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of Documents
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016 IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
Ieee transactions on networking 2018 Title with Abstract
Ieee transactions on networking 2018 Title with Abstract Ieee transactions on networking 2018 Title with Abstract
Ieee transactions on networking 2018 Title with Abstract
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learning
 
Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...
 
factorization methods
factorization methodsfactorization methods
factorization methods
 
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and AbstractIEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...
 
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...
 

Último

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 

Último (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 

IEEE Datamining 2016 Title and Abstract

  • 1. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING A Simple Message-Optimal Algorithm for Random Sampling from a Distributed Stream Abstract - We present a simple, message-optimal algorithm for maintaining a random sample from a large data stream whose input elements are distributed across multiple sites that communicate via a central coordinator. At any point in time, the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system. We present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. We also consider the important case when the distribution of elements across different sites is non- uniform, and show that for such inputs, our algorithm significantly outperforms prior solutions. IEEE Transactions on Knowledge and Data Engineering (June 1 2016) Online Learning from Trapezoidal Data Streams Abstract - In this paper, we study a new problem of continuous learning from doubly-streaming data where both data volume and feature space increase over time. We refer to the doubly- streaming data as trapezoidal data streams and the corresponding learning problem as online learning from trapezoidal data streams. The problem is challenging because both data volume and data dimension increase over time, and existing online learning [1] [2], online feature selection [3], and streaming feature selection algorithms [4] [5] are inapplicable. We propose a new Online Learning with Streaming Features algorithm (OLSF for short) and its two variants that combine online learning [1] [2] and streaming feature selection [4] [5] to enable learning from trapezoidal data streams with infinite training instances and features. Specifically, when a new training instance carrying new features arrives, a classifier updates the existing features by following the passive-aggressive update rule [2] and updates the new features by following the structural risk minimization principle. Then, feature sparsity is introduced by using the projected truncation technique. We derive performance bounds of the OLSF algorithm and its variants. We also conduct experiments on real-world data sets to show the performance of the proposed algorithms.
  • 2. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. IEEE Transactions on Knowledge and Data Engineering (May 2016) Quality-Aware Subgraph Matching Over Inconsistent Probabilistic Graph Databases Abstract - Resource Description Framework (RDF) has been widely used in the Semantic Web to describe resources and their relationships. The RDF graph is one of the most commonly used representations for RDF data. However, in many real applications such as the data extraction/integration, RDF graphs integrated from different data sources may often contain uncertain and inconsistent information (e.g., uncertain labels or that violate facts/rules), due to the unreliability of data sources. In this paper, we formalize the RDF data by inconsistent probabilistic RDF graphs, which contain both inconsistencies and uncertainty. With such a probabilistic graph model, we focus on an important problem, quality-aware subgraph matching over inconsistent probabilistic RDF graphs (QA-gMatch), which retrieves subgraphs from inconsistent probabilistic RDF graphs that are isomorphic to a given query graph and with high quality scores (considering both consistency and uncertainty). In order to efficiently answer QA- gMatch queries, we provide two effective pruning methods, namely adaptive label pruning and quality score pruning, which can greatly filter out false alarms of subgraphs. We also design an effective index to facilitate our proposed pruning methods, and propose an efficient approach for processing QA-gMatch queries. Finally, we demonstrate the efficiency and effectiveness of our proposed approaches through extensive experiments. IEEE Transactions on Knowledge and Data Engineering (June 1 2016) CavSimBase: A Database for Large Scale Comparison of Protein Binding Sites Abstract - CavBase is a database containing information about the three-dimensional geometry and the physicochemical properties of putative protein binding sites. Analyzing CavBase data typically involves computing the similarity of pairs of binding sites. In contrast to sequence alignment, however, a structural comparison of protein binding sites is a computationally challenging problem, making large scale studies difficult or even infeasible. One possibility to overcome this obstacle is to precompute pairwise similarities in an all-against-all comparison, and to make these similarities subsequently accessible to data analysis methods. Pairwise
  • 3. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. similarities, once being computed, can also be used to equip CavBase with a neighborhood structure. Taking advantage of this structure, methods for problems such as similarity retrieval can be implemented efficiently. In this paper, we tackle the problem of performing an all- against-all comparison using CavBase, consisting of more than 200,000 protein cavities, by means of parallel computation and cloud computing techniques. We present the conceptual design and technical realization of a large-scale study to create a similarity database called CavSimBase. We illustrate how CavSimBase is constructed, is accessed, and is used to answer biological questions by data analysis and similarity retrieval. IEEE Transactions on Knowledge and Data Engineering (June 1 2016) Online Subgraph Skyline Analysis over Knowledge Graphs Abstract - Subgraph search is very useful in many real-world applications. However, users may be overwhelmed by the masses of matches. In this paper, we propose a subgraph skyline analysis problem, denoted as $S^2A$, to support more complicated analysis over graph data. Specifically, given a large graph $G$ and a query graph $q$, we want to find all the subgraphs $g$ in $G$, such that $g$ is graph isomorphic to $q$ and not dominated by any other subgraphs. In order to improve the efficiency, we devise a hybrid feature encoding incorporating both structural and numeric features based on a partitioning strategy, and discuss how to optimize the space partitioning. We also present a skylayer index to facilitate the dynamic subgraph skyline computation. Moreover, an attribute cluster-based - ethod is proposed to deal with the curse of dimensionality. Extensive experiments over real datasets confirm the effectiveness and efficiency of our algorithm. IEEE Transactions on Knowledge and Data Engineering (July 1 2016) K Nearest Neighbour Joins for Big Data on MapReduce: a Theoretical and Experimental Analysis Abstract - Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only
  • 4. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for distributed large scale data processing. Although these works provide different solutions to the same problem, each one has particular constraints and properties. In this paper, we compare the different existing approaches for computing kNN on MapReduce, first theoretically, and then by performing an extensive experimental evaluation. To be able to compare solutions, we identify three generic steps for kNN computation on MapReduce: data pre-processing, data partitioning and computation. We then analyze each step from load balancing, accuracy and complexity aspects. Experiments in this paper use a variety of datasets, and analyze the impact of data volume, data dimension and the value of k from many perspectives like time and space complexity, and accuracy. The experimental part brings new advantages and shortcomings that are discussed for each algorithm. To the best of our knowledge, this is the first paper that compares kNN computing methods on MapReduce both theoretically and experimentally with the same setting. Overall, this paper can be used as a guide to tackle kNN-based practical problems in the context of big data. IEEE Transactions on Knowledge and Data Engineering (May 2016) ATD: Anomalous Topic Discovery in High Dimensional Discrete Data Abstract - We propose an algorithm for detecting patterns exhibited by anomalous clusters in high dimensional discrete data. Unlike most anomaly detection (AD) methods, which detect individual anomalies, our proposed method detects groups (clusters) of anomalies; i.e. sets of points which collectively exhibit abnormal patterns. In many applications this can lead to better understanding of the nature of the atypical behavior and to identifying the sources of the anomalies. Moreover, we consider the case where the atypical patterns exhibit on only a small (salient) subset of the very high dimensional feature space. Individual AD techniques and techniques that detect anomalies using all the features typically fail to detect such anomalies, but our method can detect such instances collectively, discover the shared anomalous patterns exhibited by them, and identify the subsets of salient features. In this paper, we focus on detecting anomalous topics in a batch of text documents, developing our algorithm based on
  • 5. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. topic models. Results of our experiments show that our method can accurately detect anomalous topics and salient features (words) under each such topic in a synthetic data set and two real- world text corpora and achieves better performance compared to both standard group AD and individual AD techniques. All required code to reproduce our experiments is available from https://github.com/hsoleimani/ATD. IEEE Transactions on Knowledge and Data Engineering (May 2016) Multilabel Classification via Co-evolutionary Multilabel Hypernetwork Abstract - Multilabel classification is prevalent in many real-world applications where data instances may be associated with multiple labels simultaneously. In multilabel classification, exploiting label correlations is an essential but nontrivial task. Most of the existing multilabel learning algorithms are either ineffective or computational demanding and less scalable in exploiting label correlations. In this paper, we propose a co-evolutionary multilabel hypernetwork (Co-MLHN) as an attempt to exploit label correlations in an effective and efficient way. To this end, we firstly convert the traditional hypernetwork into a multilabel hypernetwork (MLHN) where label correlations are explicitly represented. We then propose a co-evolutionary learning algorithm to learn an integrated classification model for all labels. The proposed Co- MLHN exploits arbitrary order label correlations and has linear computational complexity with respect to the number of labels. Empirical studies on a broad range of multilabel data sets demonstrate that Co-MLHN achieves competitive results against state-of-the-art multilabel learning algorithms, in terms of both classification performance and scalability with respect to the number of labels. IEEE Transactions on Knowledge and Data Engineering (May 2016) Learning to Find Topic Experts in Twitter via Different Relations Abstract - Expert finding has become a hot topic along with the flourishing of social networks, such as micro-blogging services like Twitter. Finding experts in Twitter is an important problem because tweets from experts are valuable sources that carry rich information (e.g., trends) in various domains. However, previous methods cannot be directly applied to Twitter expert
  • 6. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. finding problem. Recently, several attempts use the relations among users and Twitter Lists for expert finding. Nevertheless, these approaches only partially utilize such relations. To this end, we develop a probabilistic method to jointly exploit three types of relations (i.e., follower relation, user-list relation, and list-list relation) for finding experts. Specifically, we propose a Semi-Supervised Graph-based Ranking approach ($sf{SSGR}$) to offline calculate the global authority of users. In $sf{SSGR}$, we employ a normalized Laplacian regularization term to jointly explore the three relations, which is subject to the supervised information derived from Twitter crowds. We then online compute the local relevance between users and the given query. By leveraging the global authority and local relevance of users, we rank all of users and find top- N users with highest ranking scores. Experiments on real-world data demonstrate the effectiveness of our proposed approach for topic-specific expert finding in Twitt- r. IEEE Transactions on Knowledge and Data Engineering (July 2016) Analytic Queries over Geospatial Time-Series Data Using Distributed Hash Tables Abstract - As remote sensing equipment and networked observational devices continue to proliferate, their corresponding data volumes have surpassed the storage and processing capabilities of commodity computing hardware. This trend has led to the development of distributed storage frameworks that incrementally scale out by assimilating resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of research: extracting insights, relationships, and models from the underlying datasets. The focus of this study is twofold: exploratory and predictive analytics over voluminous, multidimensional datasets in a distributed environment. Both of these types of analysis represent a higher-level abstraction over standard query semantics; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. The algorithms presented in this work were evaluated empirically on a real-
  • 7. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. world geospatial time-series dataset in a production environment, and are broadly applicable across other storage frameworks. IEEE Transactions on Knowledge and Data Engineering (June 2016) RSkNN: kNN Search on Road Networks by Incorporating Social Influence Abstract - Although NN search on a road network, i.e., finding nearest objects to a query user on, has been extensively studied, existing works neglected the fact that the 's social information can play an important role in this NN query. Many real-world applications, such as location-based social networking services, require such a query. In this paper, we study a new problem: NN search on road networks by incorporating social influence (RSkNN). Specifically, the state-of- the-art Independent Cascade (IC) model in social network is applied to de- ine social influence. One critical challenge of the problem is to speed up the computation of the social influence over large road and social networks. To address this challenge, we propose three efficient index-based search algorithms, i.e., road network-based (RN-based), social network-based (SN-based), and hybrid indexing algorithms. In the RN-based algorithm, we employ a filtering-and-verification framework for tackling the hard problem of computing social influence. In the SN-based algorithm, we embed social cuts into the index, so that we speed up the query. In the hybrid algorithm, we propose an index, summarizing the road and social networks, based on which we can obtain query answers efficiently. Finally, we use real road and social network data to empirically verify the efficiency and efficacy of our solutions. IEEE Transactions on Knowledge and Data Engineering (June 2016) Unsupervised Visual Hashing with Semantic Assistant for Content-based Image Retrieval Abstract - As an emerging technology to support scalable content-based image retrieval (CBIR), hashing has been recently received great attention and became a very active research domain. In this study, we propose a novel unsupervised visual hashing approach called semantic-assisted visual hashing (SAVH). Distinguished from semi-supervised and supervised visual hashing, its core idea is to effectively extract the rich semantics latently embedded in auxiliary texts of images to boost the effectiveness of visual hashing without any explicit semantic labels. To
  • 8. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. achieve the target, a unified unsupervised framework is developed to learn hash codes by simultaneously preserving visual similarities of images, integrating the semantic assistance from auxiliary texts on modeling high-order relationships of inter-images and characterizing the correlations between images and shared topics. Our performance study on three publicly available image collections: Wiki, MIR Flickr, and NUS-WIDE indicates that SAVH can achieve superior performance over several state-of-the-art techniques. IEEE Transactions on Knowledge and Data Engineering (May 2016) A Scalable Data Chunk Similarity based Compression Approach for Efficient Big Sensing Data Processing on Cloud Abstract - Big sensing data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity. Cloud computing provides a promising platform for big sensing data processing and storage as it provides a flexible stack of massive computing, storage, and software services in a scalable manner. Current big sensing data processing on Cloud have adopted some data compression techniques. However, due to the high volume and velocity of big sensing data, traditional data compression techniques lack sufficient efficiency and scalability for data processing. Based on specific on-Cloud data compression requirements, we propose a novel scalable data compression approach based on calculating similarity among the partitioned data chunks. Instead of compressing basic data units, the compression will be conducted over partitioned data chunks. To restore original data sets, some restoration functions and predictions will be designed. MapReduce is used for algorithm implementation to achieve extra scalability on Cloud. With real world meteorological big sensing data experiments on U-Cloud platform, we demonstrate that the proposed scalable compression approach based on data chunk similarity can significantly improve data compression efficiency with affordable data accuracy loss. IEEE Transactions on Knowledge and Data Engineering (February 2016)
  • 9. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. Network Motif Discovery: A GPU Approach Abstract - The identification of network motifs has important applications in numerous domains, such as pattern detection in biological networks and graph analysis in digital circuits. However, mining network motifs is computationally challenging, as it requires enumerating subgraphs from a real-life graph, and computing the frequency of each subgraph in a large number of random graphs. In particular, existing solutions often require days to derive network motifs from biological networks with only a few thousand vertices. To address this problem, this paper presents a novel study on network motif discovery using Graphical Processing Units (GPUs). The basic idea is to employ GPUs to parallelize a large number of subgraph matching tasks in computing subgraph frequencies from random graphs, so as to reduce the overall computation time of network motif discovery. We explore the design space of GPU-based subgraph matching algorithms, with careful analysis of several crucial factors (such as branch divergences and memory coalescing) that affect the performance of GPU programs. Based on our analysis, we develop a GPU-based solution that (i) considerably differs from existing CPU-based methods in how it enumerates subgraphs, and (ii) exploits the strengths of GPUs in terms of parallelism while mitigating their limitations in terms of the computation power per GPU core. With extensive experiments on a variety of biological networks, we show that our solution is up to two orders of magnitude faster than the best CPU-based approach, and is around 20 times more cost- effective than the latter, when taking into account the monetary costs of the CPU and GPUs used. IEEE Transactions on Knowledge and Data Engineering (May 2016) Crowdsourced Data Management: A Survey Abstract - Some important data management and analytics tasks cannot be completely addressed by automated processes. These “computer-hard” tasks such as entity resolution, sentiment analysis, and image recognition, can be enhanced through the use of human cognitive ability. Human Computation is an effective way to address such tasks by harnessing the capabilities of crowd workers (i.e., the crowd). Thus, crowdsourced data management has become an area of increasing interest in research and industry. There are three important problems in crowdsourced data management. (1) Quality Control: Workers may return noisy results and effective
  • 10. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. techniques are required to achieve high quality; (2) Cost Control: The crowd is not free, and cost control aims to reduce the monetary cost; (3) Latency Control: The human workers can be slow, particularly in contrast to computing time scales, so latency-control techniques are required. There has been significant work addressing these three factors for designing crowdsourced tasks, developing crowdsourced data manipulation operators, and optimizing plans of multiple operators. In this paper, we survey and synthesize a wide spectrum of existing studies on crowdsourced data management. Based on this analysis we then outline key factors that need to be considered to improve crowdsourced data management. IEEE Transactions on Knowledge and Data Engineering (February 2016) Resolving Multi-Party Privacy Conflicts in Social Media Abstract - Items shared through Social Media may affect more than one user's privacy—e.g., photos that depict multiple users, comments that mention multiple users, events in which multiple users are invited, etc. The lack of multi-party privacy management support in current mainstream Social Media infrastructures makes users unable to appropriately control to whom these items are actually shared or not. Computational mechanisms that are able to merge the privacy preferences of multiple users into a single policy for an item can help solve this problem. However, merging multiple users’ privacy preferences is not an easy task, because privacy preferences may conflict, so methods to resolve conflicts are needed. Moreover, these methods need to consider how users’ would actually reach an agreement about a solution to the conflict in order to propose solutions that can be acceptable by all of the users affected by the item to be shared. Current approaches are either too demanding or only consider fixed ways of aggregating privacy preferences. In this paper, we propose the first computational mechanism to resolve conflicts for multi-party privacy management in Social Media that is able to adapt to different situations by modelling the concessions that users make to reach a solution to the conflicts. We also present results of a user study in which our proposed mechanism outperformed other existing approaches in terms of how many times each approach matched users’ behaviour. IEEE Transactions on Knowledge and Data Engineering (July 2016)
  • 11. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. Interactive Visualization of Large Data Sets Abstract - Visualization provides a powerful means for data analysis. But to be practical, visual analytics tools must support smooth and flexible use of visualizations at a fast rate. This becomes increasingly onerous with the ever-increasing size of real-world datasets. First, large databases make interaction more difficult once query response time exceeds several seconds. Second, any attempt to show all data points will overload the visualization, resulting in chaos that will only confuse the user. Over the last few years substantial effort has been put into addressing both of these issues and many innovative solutions have been proposed. Indeed, data visualization is a topic that is too large to be addressed in a single survey paper. Thus, we restrict our attention here to interactive visualization of large data sets. Our focus then is skewed in a natural way towards query processing problem - provided by an underlying database system - rather than to the actual data visualization problem. IEEE Transactions on Knowledge and Data Engineering (April 2016) Improving Construction of Conditional Probability Tables for Ranked Nodes in Bayesian Networks Abstract - This paper elaborates on the ranked nodes method (RNM) that is used for constructing conditional probability tables (CPTs) for Bayesian networks consisting of a class of nodes called ranked nodes. Such nodes typically represent continuous quantities that lack well-established interval scales and are hence expressed by ordinal scales. Based on expert elicitation, the CPT of a child node is generated in RNM by aggregating weighted states of parent nodes with a weight expression. RNM is also applied to nodes that are expressed by interval scales. However, the use of the method in this way may be ineffective due to challenges which are not addressed in the existing literature but are demonstrated through an illustrative example in this paper. To overcome the challenges, the paper introduces a novel approach that facilitates the use of RNM. It consists of guidelines concerning the discretization of the interval scales into ordinal ones and the determination of a weight expression and weights based on assessments of the expert about the mode of the child node. The determination is premised on interpretations and feasibility
  • 12. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. conditions of the weights derived in the paper. The utilization of the approach is demonstrated with the illustrative example throughout the paper. IEEE Transactions on Knowledge and Data Engineering (July 2016) Clearing Contamination in Large Networks Abstract - In this work, we study the problem of clearing contamination spreading through a large network where we model the problem as a graph searching game. The problem can be summarized as constructing a search strategy that will leave the graph clear of any contamination at the end of the searching process in as few steps as possible. We show that this problem is NP- hard even on directed acyclic graphs and provide an efficient approximation algorithm. We experimentally observe the performance of our approximation algorithm in relation to the lower bound on several large online networks including Slashdot, Epinions, and Twitter. IEEE Transactions on Knowledge and Data Engineering (June 2016) Private Over-threshold Aggregation Protocols over Distributed Databases Abstract - In this paper, we revisit the private over-threshold data aggregation problem.We formally define the problem’s security requirements as both data and user privacy goals. To achieve both goals, and to strike a balance between efficiency and functionality, we devise an efficient cryptographic construction and its proxy-based variant. Both schemes are provably secure in the semi-honest model. Our key idea for the constructions and their malicious variants is to compose two encryption functions tightly coupled in a way that the two functions are commutative and one public-key encryption has an additive homomorphism. We call that double encryption.We analyze the computational and communication complexities of our construction, and show that it is much more efficient than the existing protocols in the literature. Specifically, our protocol has linear complexity in computation and communication with respect to the number of users. Its round complexity is also linear in the number of users. Finally, we show that our basic protocol is efficiently transformed into a stronger protocol secure in the presence of malicious adversaries, and provide the resulting protocol’s performance and security analysis. IEEE Transactions on Knowledge and Data Engineering (May 2016)
  • 13. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. Challenges in Data Crowdsourcing Abstract - Crowdsourcing refers to solving large problems by involving human workers that solve component sub-problems or tasks. In data crowdsourcing, the problem involves data acquisition, management, and analysis. In this paper, we provide an overview of data crowdsourcing, giving examples of problems that the authors have tackled, and presenting the key design steps involved in implementing a crowdsourced solution. We also discuss some of the open challenges that remain to be solved. IEEE Transactions on Knowledge and Data Engineering (April 2016) Efficient R-Tree Based Indexing Scheme for Server-Centric Cloud Storage System Abstract - Cloud storage system poses new challenges to the community to support efficient concurrent querying tasks for various data-intensive applications, where indices always hold important positions. In this paper, we explore a practical method to construct a two-layer indexing scheme for multi-dimensional data in diverse server-centric cloud storage system. We first propose RT-HCN, an indexing scheme integrating R-tree based indexing structure and HCN-based routing protocol. RT-HCN organizes storage and compute nodes into an HCN overlay, one of the newly proposed sever-centric data center topologies. Based on the properties of HCN, we design a specific index mapping technique to maintain layered global indices and corresponding query processing algorithms to support efficient query tasks. Then, we expand the idea of RT-HCN onto another server-centric data center topology DCell, discovering a potential generalized and feasible way of deploying two-layer indexing schemes on other server-centric networks. Furthermore, we prove theoretically that RT-HCN is both space-efficient and query- efficient, by which each node actually maintains a tolerable number of global indices while high concurrent queries can be processed within accepted overhead. We finally conduct targeted experiments on Amazon's EC2 platforms, comparing our design with RT-CAN, a similar indexing scheme for traditional P2P network. The results validate the query efficiency, especially the speedup of point query of RT-HCN, depicting its potential applicability in future data centers. IEEE Transactions on Knowledge and Data Engineering (June 2016)
  • 14. For Details, Contact TSYS Academic Projects. Ph: 9841103123, 044-42607879, Website: http://www.tsys.co.in/ Mail Id: tsysglobalsolutions2014@gmail.com. SUPPORT OFFERED TO REGISTERED STUDENTS: 1. IEEE Base paper. 2. Review material as per individuals’ university guidelines 3. Future Enhancement 4. assist in answering all critical questions 5. Training on programming language 6. Complete Source Code. 7. Final Report / Document 8. International Conference / International Journal Publication on your Project. FOLLOW US ON FACEBOOK @ TSYS Academic Projects