SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
SMalL - Semantic Malware Log-based reporter

                               Stefan Ceriu, Stefan Prutianu
            Faculty of Computer Science, „Al. I. Cuza“ University, Iasi, Romania
                         { stefan.ceriu, stefan.prutianu}@info.uaic.ro



      Abstract. In this paper we present the SMalL Ontology for malicious software
      classification, SMalL Java Application for antivirus systems comparison and
      the SMalL knowledge based file format for malware related attacks. We believe
      that our ontology is able to aid the development of malware prevention software
      by offering a common knowledge base and a clear classification of the existing
      malicious software. The application is a prototype regarding how this ontology
      might be used in conjunction with known antivirus capabilities to offer a
      comprehensive comparison.

      Keywords: malware, semantic web, jena, owl, protégé, ontology, virus, worm,
      Trojan, spyware, crimeware;




1 Introduction


         Malware, also known as malicious code and malicious software, refers to a
program that is inserted into a system, usually covertly, with the intent of
compromising the confidentiality, integrity, or availability of the victim‘s data,
applications, or operating system or otherwise annoying or disrupting the victim.
Malware has become the most significant external threat to most systems, causing
widespread damage and disruption, and necessitating extensive recovery efforts
within most organizations. Spyware malware intended to violate a user‘s privacy has
also become a major concern to organizations. Although privacy-violating malware
has been in use for many years, it has become much more widespread recently, with
spyware invading many systems to monitor personal activities and conduct financial
fraud. Organizations also face similar threats from a few forms of non-malware
threats that are often associated with malware. One of these forms that has become
commonplace is phishing, which is using deceptive computer-based means to trick
individuals into disclosing sensitive information. Another common form is virus
hoaxes, which are false warnings of new malware threats.
        We will further look into way by witch to classify all the different types of
malware by means of a new ontology and an application designed to work with it
towards comparing different antivirus systems available.
2 Ontologies and OWL


2.1 Overview

         The term ontology originates from philosophy. In that context, it is used as
the name of a subfield of philosophy, namely, the study of the nature of existence, the
branch of metaphysics concerned with identifying, in the most general terms, the
kinds of things that actually exist, and how to describe them. For example, the
observation that the world is made up of specific objects that can be grouped into
abstract classes based on shared properties is a typical ontological commitment.
However, in more recent years, ontology has become one of the many words hijacked
by computer science and given a specific technical meaning that is rather different
from the original one. Instead of ―ontology‖ we now speak of ―an ontology.‖ In
general, an ontology describes formally a domain of discourse. Typically, an ontology
consists of a finite list of terms and the relationships between these terms. The terms
denote important concepts (classes of objects) of the domain. For example, in a
university setting, staff members, students, courses, lecture theaters, and disciplines
are some important concepts. The relationships typically include hierarchies of
classes. A hierarchy specifies a class C to be a subclass of another class S if every
object in C is also included in S. For example, all faculty members are staff members.
         Apart from subclass relationships, ontologies may include information
                    properties (X teaches Y)
                    value restrictions (only faculty members may teach courses)
                    disjointness statements (faculty and general staff are disjoint)
                    specifications of logical relationships between objects (every
                         department must include at least ten faculty members).
         In the context of the Web, ontologies provide a shared understanding of a
domain. Such a shared understanding is necessary to overcome differences in
terminology. One application‘s zip code may be the same as another application‘s
area code. Another problem is that two applications may use the same term with
different meanings. In university A, a course may refer to a degree (like computer
science), while in university B it may mean a single subject (CS 101). Such
differences can be overcome by mapping the particular terminology to a shared
ontology or by defining direct mappings between the ontologies. In either case, it is
easy to see that ontologies support semantic interoperability.
         Ontologies are useful for the organization and navigation of Web sites. Many
web sites today expose on the left-hand side of the page the top levels of a concept
hierarchy of terms. The user may click on one of them to expand the subcategories.
Also, ontologies are useful for improving the accuracy of Web searches. The search
engines can look for pages that refer to a precise concept in an ontology instead of
collecting all pages in which certain, generally ambiguous, keywords occur. In this
way, differences in terminology between Web pages and the queries can be
overcome. In addition, Web searches can exploit generalization/specialization
information. If a query fails to find any relevant documents, the search engine may
suggest to the user a more general query. It is even conceivable for the engine to run
such queries proactively to reduce the reaction time in case the user adopts a
suggestion. Or if too many answers are retrieved, the search engine may suggest to
the user some specializations.
         The Web Ontology Working Group of W3C identified a number of
characteristic use cases for the Semantic Web that would require much more
expressiveness than RDF and RDF Schema offer. A number of research groups in
both the United States and Europe had already identified the need for a more powerful
ontology modeling language. This led to a joint initiative to define a richer language,
called DAML+OIL (the name is a join of the names of the U.S. proposal DAML-
ONT and the European language OIL). DAML+OIL in turn was taken as the starting
point for the W3C Web Ontology Working Group in defining OWL, the language that
is aimed to be the standardized and broadly accepted ontology language of the
Semantic Web.
         Ontology languages allow users to write explicit, formal conceptualizations
of domain models. The main requirements are a well-defined syntax, efficient
reasoning support, a formal semantics, sufficient expressive power and convenience
of expression. The importance of a well-defined syntax is clear and known from the
area of programming languages; it is a necessary condition for machine processing of
information. All the languages we have presented so far have a well defined syntax.
DAML+OIL and OWL build upon RDF and RDFS and have the same kind of syntax.
Of course, it is questionable whether the XML-based RDF syntax is very user-
friendly; there are alternatives better suited to human users (for example, see the OIL
syntax). However, this drawback is not very significant because ultimately users will
be developing their own ontologies using authoring tools, or more generally, ontology
development tools, instead of writing them directly in DAML+OIL or OWL.
A formal semantics describes the meaning of knowledge precisely. Precisely here
means that the semantics does not refer to subjective intuitions, nor is it open to
different interpretations by different people (or machines). The importance of a
formal semantics is well-established in the domain of mathematical logic, for
instance. One use of a formal semantics is to allow people to reason about the
knowledge. For ontological knowledge, we may reason about the following:
      Class membership. If x is an instance of a class C, and C is a subclass of D,
          then we can infer that x is an instance of D
      Equivalence of classes. If class A is equivalent to class B, and class B is
          equivalent to class C, then A is equivalent to C, too.
      Consistency. Suppose we have declared x to be an instance of the class A
          and that A is a subclass of B ∩ C, A is a subclass of D, and B and D are
          disjoint. Then we have an inconsistency because A should be empty but has
          the instance x. This is an indication of an error in the ontology.
      Classification. If we have declared that certain property-value pairs are a
          sufficient condition for membership in a class A, then if an individual x
          satisfies such conditions, we can conclude that x must be an instance of A.

    Semantics is a prerequisite for reasoning support. Derivations such as the
preceding ones can be made mechanically instead of being made by hand.
Reasoning support is important because it allows one to:
     check the consistency of the ontology and the knowledge
     check for unintended relationships between classes
     automatically classify instances in classes

          Automated reasoning support allows one to check many more cases than
could be checked manually. Checks like the preceding ones are valuable for designing
large ontologies, where multiple authors are involved, and for integrating and sharing
ontologies from various sources. A formal semantics and reasoning support are
usually provided by mapping an ontology language to a known logical formalism, and
by using automated reasoners that already exist for those formalisms. OWL is
(partially) mapped on description logic, and makes use of existing reasoners such as
FaCT and RACER. Description logics are a subset of predicate logic for which
efficient reasoning support is possible.
          RDF and RDFS allow the representation of some ontological knowledge.
The main modeling primitives of RDF/RDFS concern the organization of
vocabularies in typed hierarchies: subclass and sub-property relationships, domain
and range restrictions, and instances of classes. However, a number of other features
are missing. Here we list a few:
      Local scope of properties. rdfs:range defines the range of a property,
          say eats, for all classes. Thus in RDF Schema we cannot declare range
          restrictions that apply to some classes only. For example, we cannot say that
          cows eat only plants, while other animals may eat meat too
      Disjointness of classes. Sometimes we wish to say that classes are disjoint.
          For example, male and female are disjoint. But in RDF Schema we can only
          state subclass relationships, e.g., female is a subclass of person
      Boolean combinations of classes. Sometimes we wish to build new classes
          by combining other classes using union, intersection, and complement. For
          example, we may wish to define the class person to be the disjoint union of
          the classes male and female. RDF Schema does not allow such definitions
      Cardinality restrictions. Sometimes we wish to place restrictions on how
          many distinct values a property may or must take. For example, we would
          like to say that a person has exactly two parents, or that a course is taught by
          at least one lecturer. Again, such restrictions are impossible to express in
          RDF Schema
      Special characteristics of properties. Sometimes it is useful to say that a
          property is transitive (like ―greater than‖), unique (like ―is mother of‖), or
          the inverse of another property (like ―eats‖ and ―is eaten by‖)

         Thus we need an ontology language that is richer than RDF Schema, a
language that offers these features and more. In designing such a language one should
be aware of the trade-off between expressive power and efficient reasoning support.
Generally speaking, the richer the language, the more inefficient the reasoning
support becomes, often crossing the border of non-computability. Thus we need a
compromise, a language that can be supported by reasonably efficient reasoners while
being sufficiently expressive to express large classes of ontologies and knowledge.
2.2 Protégé

         Knowledge about the application domain is one of the most important
cornerstones of successful software projects. We must gather at least a basic
understanding of the concepts relevant to your customers before we can begin coding.
For example, we need to know how your customer's business processes work before
we can develop a warehouse management system; we need to know that users who
buy cat food might also be interested in cat litter before you can implement purchase
recommendations for an online shop.
         We acquire such knowledge from domain experts and capture it in some kind
of domain model. In simple cases, we can scribble these models on paper. This
approach works fine for small projects and when the experts help us decipher their
handwriting. But it's better to have models that directly translate into a Java program.
For instance, we can use Unified Modeling Language (UML) to sketch the domain
models with class diagrams and use cases. UML is quite good for quickly getting to
an implementation, but it is basically a language for object-oriented programming that
few domain experts fully understand. And it consists of a fixed set of modeling
constructs (such as classes and attributes) that are not very useful when domain
experts would rather talk about specific business processes and products.
         The Protégé-OWL editor is an extension of Protégé that supports the Web
Ontology Language (OWL). OWL is the most recent development in standard
ontology languages, endorsed by the World Wide Web Consortium (W3C) to
promote the Semantic Web vision. An OWL ontology may include descriptions of
classes, properties and their instances. Given such an ontology, the OWL formal
semantics specifies how to derive its logical consequences, i.e. facts not literally
present in the ontology, but entailed by the semantics. These entailments may be
based on a single document or multiple distributed documents that have been
combined using defined OWL mechanisms.

  The Protégé-OWL editor enables users to:
  •    Load and save OWL and RDF ontologies.
  •    Edit and visualize classes, properties, and SWRL rules.
  •    Define logical class characteristics as OWL expressions.
  •    Execute reasoners such as description logic classifiers.
  •    Edit OWL individuals for Semantic Web markup.

          Protégé-OWL's flexible architecture makes it easy to configure and extend
the tool. It is tightly integrated with Jena and has an open-source Java API for the
development of custom-tailored user interface components or arbitrary Semantic Web
services.
          From a programmer's perspective, one of Protégé's most attractive features is
that it provides an open source API to plug in your own Java components and access
the domain models from your application. As a result, you can develop systems very
rapidly: just start with the underlying domain model, let Protégé generate the basic
user interface, and then gradually write widgets and plug-ins to customize look-and-
feel and behavior.
Individuals, represent objects in the domain in which we are interested 2. An
important difference between Protégé and OWL is that OWL does not use the Unique
Name Assumption (UNA). This means that two different names could actually refer
to the same individual. For example, ―Queen Elizabeth‖, ―The Queen‖ and ―Elizabeth
Windsor‖ might all refer to the same individual. In OWL, it must be explicitly stated
that individuals are the same as each other, or different to each other — otherwise
they might be the same as each other, or they might be different to each other.
           Properties are binary relations on individuals - i.e. properties link two
individuals together. For example, the property hasSibling might link the individual
Matthew to the individual Gemma, or the property hasChild might link the individual
Peter to the individual Matthew. Properties can have inverses. For example, the
inverse of hasOwner is isOwnedBy. Properties can be limited to having a single value
– i.e. to being functional. They can also be either transitive or symmetric.
           OWL classes are interpreted as sets that contain individuals. They are
described using formal (mathematical) descriptions that state precisely the
requirements for membership of the class. For example, the class Cat would contain
all the individuals that are cats in our domain of interest. Classes may be organised
into a superclass-subclass hierarchy, which is also known as a taxonomy. Subclasses
specialize (‗are subsumed by‘) their superclasses. For example consider the classes
Animal and Cat – Cat might be a subclass of Animal (so Animal is the superclass of
Cat). This says that, ‗All cats are animals‘, ‗All members of the class Cat are
members of the class Animal‘, ‗Being a Cat implies that you‘re an Animal‘, and ‗Cat
is subsumed by Animal‘. One of the key features of OWL-DL is that these superclass-
subclass relationships (subsumption relationships) can be computed automatically by
a reasoned. In OWL classes are built up of descriptions that specify the conditions
that must be satisfied by an individual for it to be a member of the class.
           OWL Classes are assumed to ‗overlap‘. We therefore cannot assume that an
individual is not a member of a particular class simply because it has not been
asserted to be a member of that class. In order to ‗separate‘ a group of classes we
must make them disjoint from one another. This ensures that an individual who has
been asserted to be a member of one of the classes in the group cannot be a member
of any other classes in that group.
           One of the key features of ontologies that are described using OWL-DL is
that they can be processed by a reasoner. One of the main services offered by a
reasoner is to test whether or not one class is a subclass of another class. By
performing such tests on the classes in an ontology it is possible for a reasoner to
compute the inferred ontology class hierarchy. Another standard service that is
offered by reasoners is consistency checking. Based on the description (conditions) of
a class the reasoner can check whether or not it is possible for the class to have any
instances. A class is deemed to be inconsistent if it cannot possibly have any
instances.
           Protégé allows different OWL reasoners to be plugged-in; the reasoner
shipped with Protégé is called Fact++. The ontology can be ‗sent to the reasoner‘ to
automatically compute the classification hierarchy and also to check the logical
consistency of the ontology. In Protégé the ‗manually constructed‘ class hierarchy is
called the asserted hierarchy. The class hierarchy that is automatically computed by
the reasoner is called the inferred hierarchy. Being able to use a reasoner to
automatically compute the class hierarchy is one of the major benefits of building an
ontology using the OWL-DL sub-language. When constructing very large ontologies
(with upwards of several thousand classes in them) the use of a reasoner to compute
subclass-superclass relationships between classes becomes almost vital. Without a
reasoner it is very difficult to keep large ontologies in a maintainable and logically
correct state. In cases where ontologies can have classes that have many superclasses
(multiple inheritance) it is nearly always a good idea to construct the class hierarchy
as a simple tree. Classes in the asserted hierarchy (manually constructed hierarchy)
therefore have no more than one superclass. Computing and maintaining multiple
inheritance is the job of the reasoner. This technique helps to keep the ontology in a
maintainable and modular state. Not only does this promote the reuse of the ontology
by other ontologies and applications, it also minimizes human errors that are inherent
in maintaining a multiple inheritance hierarchy.


3 Malware


3.1 Overview

          Malware, short for malicious software, is software designed to infiltrate a
computer system without the owner's informed consent. The expression is a general
term used by computer professionals to mean a variety of forms of hostile, intrusive,
or annoying software or program code. The term "computer virus" is sometimes used
as a catch-all phrase to include all types of malware, including true viruses. Software
is considered malware based on the perceived intent of the creator rather than any
particular features. Malware includes computer viruses, worms, Trojan horses, most
root kits, spyware, dishonest adware, crime ware and other malicious and unwanted
software. In law, malware is sometimes known as a computer contaminant, for
instance in the legal codes of several U. S. states, including California and West
Virginia.
          Malware is not the same as defective software, that is, software that has a
legitimate purpose but contains harmful bugs. Preliminary results from Symantec
published in 2008 suggested that “the release rate of malicious code and other
unwanted programs may be exceeding that of legitimate software applications”.
According to F-Secure, "as much malware [was] produced in 2007 as in the previous
20 years altogether." Malware's most common pathway from criminals to users is
through the Internet: primarily by e-mail and the World Wide Web.
          The prevalence of malware as a vehicle for organized Internet crime, along
with the general inability of traditional anti-malware protection platforms to protect
against the continuous stream of unique and newly produced professional malware,
has seen the adoption of a new mindset for businesses operating on the Internet - the
acknowledgment that some sizable percentage of Internet customers will always be
infected for some reason or other, and that they need to continue doing business with
infected customers. The result is a greater emphasis on back-office systems designed
to spot fraudulent activities associated with advanced malware operating on
customers' computers.
         Many early infectious programs, including the first Internet Worm and a
number of MS-DOS viruses, were written as experiments or pranks generally
intended to be harmless or merely annoying rather than to cause serious damage to
computers. In some cases the perpetrator did not realize how much harm their
creations could do. Young programmers learning about viruses and the techniques
wrote them for the sole purpose that they could or to see how far it could spread. As
late as 1999, widespread viruses such as the Melissa virus appear to have been written
chiefly as pranks.
         Hostile intent related to vandalism can be found in programs designed to
cause harm or data loss. Many DOS viruses, and the Windows ExploreZip worm,
were designed to destroy files on a hard disk, or to corrupt the file system by writing
invalid data. Network-borne worms such as the 2001 Code Red worm or the Ramen
worm fall into the same category. Designed to vandalize web pages, worms may seem
like the online equivalent to graffiti tagging, with the author's alias or affinity group
appearing everywhere the worm goes.
         However, since the rise of widespread broadband Internet access, malicious
software has come to be designed for a profit motive, either more or less legal (forced
advertising) or criminal. For instance, since 2003, the majority of widespread viruses
and worms have been designed to take control of users' computers for black-market
exploitation.[citation needed] Infected "zombie computers" are used to send email
spam, to host contraband data such as child pornography, or to engage in distributed
denial-of-service attacks as a form of extortion.
         Another strictly for-profit category of malware has emerged in spyware -
programs designed to monitor users' web browsing, display unsolicited
advertisements, or redirect affiliate marketing revenues to the spyware creator.
Spyware programs do not spread like viruses; they are, in general, installed by
exploiting security holes or are packaged with user-installed software, such as peer-
to-peer applications.
         The best-known types of malware, viruses and worms, are known for the
manner in which they spread, rather than any other particular behavior. The term
computer virus is used for a program that has infected some executable software and
that causes that software, when run, to spread the virus to other executable software.
Viruses may also contain a payload that performs other actions, often malicious. A
worm, on the other hand, is a program that actively transmits itself over a network to
infect other computers. It too may carry a payload.
         These definitions lead to the observation that a virus requires user
intervention to spread, whereas a worm spreads automatically. Using this distinction,
infections transmitted by email or Microsoft Word documents, which rely on the
recipient opening a file or email to infect the system, would be classified as viruses
rather than worms. Some writers in the trade and popular press appear to
misunderstand this distinction, and use the terms interchangeably.
         For a malicious program to accomplish its goals, it must be able to do so
without being shut down, or deleted by the user or administrator of the computer on
which it is running. Concealment can also help get the malware installed in the first
place. When a malicious program is disguised as something innocuous or desirable,
users may be tempted to install it without knowing what it does. This is the technique
of the Trojan horse or Trojan.
          In broad terms, a Trojan horse is any program that invites the user to run it,
concealing a harmful or malicious payload. The payload may take effect immediately
and can lead to many undesirable effects, such as deleting the user's files or further
installing malicious or undesirable software. Trojan horses known as droppers are
used to start off a worm outbreak, by injecting the worm into users' local networks.
One of the most common ways that spyware is distributed is as a Trojan horse,
bundled with a piece of desirable software that the user downloads from the Internet.
When the user installs the software, the spyware is installed alongside. Spyware
authors who attempt to act in a legal fashion may include an end-user license
agreement that states the behavior of the spyware in loose terms, which the users are
unlikely to read or understand.
          Once a malicious program is installed on a system, it is essential that it stay
concealed, to avoid detection and disinfection. The same is true when a human
attacker breaks into a computer directly. Techniques known as root kits allow this
concealment, by modifying the host operating system so that the malware is hidden
from the user. Root kits can prevent a malicious process from being visible in the
system's list of processes, or keep its files from being read. Originally, a root kit was a
set of tools installed by a human attacker on a Unix system where the attacker had
gained administrator (root) access. Today, the term is used more generally for
concealment routines in a malicious program.
          Some malicious programs contain routines to defend against removal, not
merely to hide themselves, but to repel attempts to remove them. An early example of
this behavior is recorded in the Jargon File tale of a pair of programs infesting a
Xerox CP-V timesharing system. Each ghost-job would detect the fact that the other
had been killed, and would start a new copy of the recently slain program within a
few milliseconds. The only way to kill both ghosts was to kill them simultaneously
(very difficult) or to deliberately crash the system. Similar techniques are used by
some modern malware, wherein the malware starts a number of processes that
monitor and restore one another as needed.
          A backdoor is a method of bypassing normal authentication procedures.
Once a system has been compromised (by one of the above methods, or in some other
way), one or more backdoors may be installed in order to allow easier access in the
future. Backdoors may also be installed prior to malicious software, to allow attackers
entry.
          The idea has often been suggested that computer manufacturers preinstall
backdoors on their systems to provide technical support for customers, but this has
never been reliably verified. Crackers typically use backdoors to secure remote access
to a computer, while attempting to remain hidden from casual inspection. To install
backdoors crackers may use Trojan horses, worms, or other methods.

         During the 1980s and 1990s, it was usually taken for granted that malicious
programs were created as a form of vandalism or prank. More recently, the greater
share of malware programs have been written with a financial or profit motive in
mind. This can be taken as the malware authors' choice to monetize their control over
infected systems: to turn that control into a source of revenue.
Spyware programs are commercially produced for the purpose of gathering
information about computer users, showing them pop-up ads, or altering web-browser
behavior for the financial benefit of the spyware creator. For instance, some spyware
programs redirect search engine results to paid advertisements. Others, often called
"stealware" by the media, overwrite affiliate marketing codes so that revenue is
redirected to the spyware creator rather than the intended recipient.
          Spyware programs are sometimes installed as Trojan horses of one sort or
another. They differ in that their creators present themselves openly as businesses, for
instance by selling advertising space on the pop-ups created by the malware. Most
such programs present the user with an end-user license agreement that purportedly
protects the creator from prosecution under computer contaminant laws. However,
spyware EULAs have not yet been upheld in court.
          Another way that financially-motivated malware creators can profit from
their infections is to directly use the infected computers to do work for the creator.
The infected computers are used as proxies to send out spam messages. A computer
left in this state is often known as a zombie computer. The advantage to spammers of
using infected computers is they provide anonymity, protecting the spammer from
prosecution. Spammers have also used infected PCs to target anti-spam organizations
with distributed denial-of-service attacks.
          In order to coordinate the activity of many infected computers, attackers
have used coordinating systems known as botnets. In a botnet, the malware or malbot
logs in to an Internet Relay Chat channel or other chat system. The attacker can then
give instructions to all the infected systems simultaneously. Botnets can also be used
to push upgraded malware to the infected systems, keeping them resistant to antivirus
software or other security measures.
          It is possible for a malware creator to profit by stealing sensitive information
from a victim. Some malware programs install a key logger, which intercepts the
user's keystrokes when entering a password, credit card number, or other information
that may be exploited. This is then transmitted to the malware creator automatically,
enabling credit card fraud and other theft. Similarly, malware may copy the CD key
or password for online games, allowing the creator to steal accounts or virtual items.
          Another way of stealing money from the infected PC owner is to take control
of a dial-up modem and dial an expensive toll call. Dialer (or porn dialer) software
dials up a premium-rate telephone number such as a U.S. "900 number" and leave the
line open, charging the toll to the infected user.
          Data-stealing malware is a web threat that divests victims of personal and
proprietary information with the intent of monetizing stolen data through direct use or
underground distribution. Content security threats that fall under this umbrella include
keyloggers, screen scrapers, spyware, adware, backdoors, and bots. The term does not
refer to activities such as spam, phishing, DNS poisoning, SEO abuse, etc. However,
when these threats result in file download or direct installation, as most hybrid attacks
do, files that act as agents to proxy information will fall into the data-stealing malware
category.
3.2 SMalL Ontology

         The SMalL Ontology is designed to aid the development of malware
prevention software by offering a common knowledge base and a clear classification
of the existing malicious software. It covers all the different categories and
subcategories of malware and organized based on behavior, propagation methods,
payload, motivation etc.
         The ontology is divided into five main categories based on the major
malicious software threats: Crimeware, Spyware, Trojans, Viruses and Worms.
         A virus replicates by attaching its program instructions to an ordinary ―host‖
program or document, so that the virus instructions are executed when the host
program is executed. There are five main virus categories:
     File virus - uses the file system of a given OS (or more than one) to
         propagate. File viruses include viruses that infect executable files,
         companion viruses that create duplicates of files, viruses that copy
         themselves into various directories, and link viruses that exploit file system
         features.
     Boot sector virus - infects the boot sector or the master boot record, or
         displaces the active boot sector, of a hard drive. Once the hard drive is
         booted up, boot sector viruses load themselves into the computer‘s memory.
         Many boot sector viruses, once executed, prevent the O S from booting. Boot
         sector viruses were widespread in the 1990s, but have almost disappeared
         since the introduction of 32-bit processors and the near-disappearance of
         floppy disks as a storage medium for executables.
     Macro virus - written in the macro scripting languages of word processing,
         accounting, editing, or project applications, it propagates by exploiting the
         macro language‘s properties in order to transfer itself from the infected file
         containing the macro script to another file. The most widespread macro
         viruses are for Microsoft Office applications (Word, Excel, PowerPoint,
         Access). Because they are written in the code of application software, macro
         viruses are platform independent and can spread between Mac, Windows,
         Linux, and any other system running the targeted application.
     Email virus - refers to the delivery mechanism rather than the infection target
         or behavior. Email can be used to transmit any of the above types of virus by
         copying and emailing itself to every address in the victim‘s email address
         book, usually within an email attachment. Each time a recipient opens the
         infected attachment, the virus harvests that victim‘s email address book and
         repeats its propagation process.
     Multi-variant virus - the same core virus but implemented with slight
         variations, so that an anti-virus scanner that can detect one variant will not be
         able to detect the other variants.

         Worms are Self-propagating program that spreads over a network, usually
the Internet. Unlike viruses, may not depend on other programs or victim actions
(such as opening an infected email attachment or clicking on the Web link for a
malware Web site) for replication, dissemination, or execution. Worms spread by
locating other vulnerable potential hosts on the network (e.g., via scanning or
topological analysis), then copying their program instructions to those hosts. There
are five main categories of computer worms:
      Email worm - spreads via infected email attachments
      Instant messaging worm - Spread via infected attachments to IM messages or
         reader access to Uniform Resource Locators (URL) in IM messages that
         point to malicious Web sites from which the worm is downloaded.
      IRC Worm - Comparable to IM worms, but exploit IRC rather than IM
         channels.
      P2P Worm - Copies itself into a shared folder, then uses P2P mechanisms to
         announce its existence in hopes that other P2P users will download and
         execute it.
      Web Worm - Spread via user access to a Web page, File Transfer Protocol
         (FTP) site, or other Internet resources.

         A Trojan Horse is a destructive program that masquerades as a benign
program. Stealthware such as spyware, rootkits, keyloggers, trapdoors, and certain
adware represents a subset of Trojans that is intentionally designed to be hard-to
detect or undetectable Trojan horse software installs itself on the victim‘s computer
when the victim opens an email attachment or computer file containing the Trojan, or
clicks on a Web link that directs the victim‘s browser to a Web site from which the
Trojan is automatically downloaded. Once installed, the software can be controlled
remotely by hackers for criminal or other malicious purposes, such as extracting
money, passwords, or other sensitive information, or to create a zombie from which to
disseminate spam, phishing emails, the same Trojan, or other malware to other
computers on the network/Internet. Trojan horses are classified in six categories:
      Backdoor Trojan (also known as Trapdoor Trojan or Remote-Access Trojan)
         acts as a remote administration utility that enables control of the infected
         machine by a remote host.
      Data-collecting Trojan - surreptitiously collects and sends back information
         from the victim‘s machine. The surreptitious nature of such software has led
         to it being referred to as ―stealth ware.‖
      Downloader or Dropper - downloads, installs, and in the case of the
         Downloader, launches additional malware on the victim‘s machine.
      Proxy Trojan - turns the victim‘s computer into a proxy server (i.e., a
         zombie) that operates on behalf of the remote attacker. If the attacker‘s
         activities are detected and tracked, the trail leads back to the victim rather
         than to the attacker.
      Rootkit - a collection of programs used by a hacker to evade detection while
         trying to gain unauthorized access to the victim‘s computer. Rootkits are
         designed to hide processes, files or Windows Registry entries. Rootkits are
         used by hackers to hide their tracks or to insert threats surreptitiously on
         compromised computers. Various types of malware use rootkits to hide
         themselves on a computer
      Bot - any type of malware (e.g., Trojan, worm, spyware bots or spybots) that
         enables the attacker to surreptitiously gain complete control of the infected
         machine. A computer that has been infected by a bot is referred to as a
zombie or, sometimes, a drone. Bots may be further subcategorized
         according to their delivery mechanism. For example, a Spam bot is similar to
         an email virus or mass-mailing worm in that it relies on the intended victim‘s
         action to activate it, either by opening an attachment affixed to a spam email,
         or by clicking on a Web link within a spam email which points to a Web site
         from which the bot is downloaded to the victim‘s computer

         Spyware represents non-Trojan stealthware that has the same objectives and
performs the same types of actions as spyware Trojans. A number of bots have
spyware capabilities, and are referred to as spybots. They are categorized in 2 main
categories:
        Adware- Software that automatically displays advertising material to the
         user, resulting in an unpleasant user experience. If malicious, adware usually
         exhibits the behaviors and/or infection techniques used by viruses, worms,
         and/or spyware.
        Tracking cookie - a cookie is a data structure that stores information about a
         user‘s browser session state. While cookies are a necessary component of
         how many Web sites operate, tracking cookies are specifically designed to
         track a user‘s behavior across multiple sites. Spyware sites routinely use
         tracking cookies to monitor a user‘s browsing behavior and associate it with
         the user‘s personal data such as name, credit card number, and other private
         information, which can then be harvested and sold to illicit marketers or
         cybercriminals.

          Crimeware is malware used in aid of criminal activities. This said, there are
specific types of malware used predominantly or exclusively as crimeware. Four main
crimeware are known:
     Email redirector - used to intercept and relay outgoing emails to the
          attacker‘s system.
     IM redirector - used to intercept and relay outgoing instant messages to the
          attacker‘s system.
     Clicker - redirects the victim to a Web site or Internet resource by sending
          the necessary commands to the victim‘s browser or replacing the system
          file(s) in which standard Internet URLs are stored (e.g., the Microsoft
          Windows hosts file).
     Transaction generator- targets not the end-user computer but the computer of
          a corporate or financial institution‘s computer center. The software generates
          fraudulent transactions on behalf of the attacker within the victim
          organization‘s payment processing or other financial systems. In some
          instances, transaction generators are used to intercept credit card data for
          abuse by the attacker.
     Session hijacker - usually a malicious browser component that, after the
          victim logs in or begins a browser session, takes over that session to enable a
          hacker to exploit it, usually to perform criminal actions, such as transferring
          money from the victim‘s bank account.
Figure 1. SMalL Ontology
3.3 SMalL Java Application

          The SMalL Java Application is a tool designed to compare available
software security systems. It works in conjunction with the SMalL ontology to
provide better ways by which users can examine similarities and differences between
antivirus solutions.
          The application allows the user to add a new antivirus to the ontology and
link its properties to the available malware knowledgebase. The user can afterwards
compare the security systems and see exactly which one prevents against a given type
of malware and which one doesn’t, on which operating system they run .etc. The
application main windows are presented in Figure2.1, Figure 2.2 and Figure 2.3


3.3 SMalL File Format

          We believe that the file format for malware related attacks can be an OWL
file created by extracting data relevant to the given attack directly from the SMalL
Ontology. For example in the case of an adware attack the file could contain the
antivirus used, the operating system it runs on and that the system might also be
infected with a Trojan. If this is the case and the antivirus didn’t manage to find the
Trojan then supplementary scans are required to find the problem. In the case a
system is infected by multiple malware programs then a custom file can be created
and the problems related so that on other occasions the antivirus can check for all of
them when one appears.



3.3 Conclusions

         We created an ontology for malicious software classification which is able to
aid the development of malware prevention software by offering a common
knowledge base and a clear classification of the existing security issues. We presented
an application prototype which handles antivirus software comparison based on the
information available in the ontology and user entered data. We also proposed The
SMalL file format which is a comprehensive way to report software security issues
and brings new possibilities regarding scanning for software security problems.
Figure 2.1 Main application window
Figure 2.2 Add new antivirus window
Figure 2.3 Antivirus comparison window
References

   1.  Yu, Liang: Introduction to the Semantic Web and Semantic Web Services
   2.  Robert, Colomb: Ontology and the Semantic Web
   3.  Matthew, Horridge: A Practical Guide To Building OWL Ontologies Using Protégé 4
       and CO-ODE Tools
   4. Nicholas, Weaver, Vern, Paxson, Stuat, Staniford, Robert, Cunningham: A
       Taxonomy of Computer Worms
   5. Information Assurance Tools Report: Malware
   6. AntiVirus Software Review: http://anti-virus-software-review.toptenreviews.com/
   7. Protégé documentation: http://protege.stanford.edu/doc/users.html
   8. Joanna, Rutkowska: Introducing Stealth Malware Taxonomy
   9. Peter, Mell, Karen, Kent, Joseph, Nusbaum: Introducing Stealth Malware Taxonomy
   10. Peter, Gutmann: The commercial malware industry
   11. Grigoris, Antoniou, Frank, van Harmelen: Web Ontology Language: OWL
   12. Jena documentation: http://jena.sourceforge.net/documentation.html

Mais conteúdo relacionado

Mais procurados

Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mappingsamhati27
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologySteven Miller
 
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSsipij
 
Using linguistic analysis to translate
Using linguistic analysis to translateUsing linguistic analysis to translate
Using linguistic analysis to translateIJwest
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
 
Xml based data exchange in the
Xml based data exchange in theXml based data exchange in the
Xml based data exchange in theIJwest
 
Ontology-based Data Integration
Ontology-based Data IntegrationOntology-based Data Integration
Ontology-based Data IntegrationJanna Hastings
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic webWorawith Sangkatip
 
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits gerogepatton
 
Use of ontologies in natural language processing
Use of ontologies in natural language processingUse of ontologies in natural language processing
Use of ontologies in natural language processingATHMAN HAJ-HAMOU
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Rinke Hoekstra
 
Aggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperAggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperDBOnto
 
Lecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented ApplicationsLecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented ApplicationsMarina Santini
 

Mais procurados (19)

Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and Terminology
 
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
 
Ontology matching
Ontology matchingOntology matching
Ontology matching
 
Using linguistic analysis to translate
Using linguistic analysis to translateUsing linguistic analysis to translate
Using linguistic analysis to translate
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 
Xml based data exchange in the
Xml based data exchange in theXml based data exchange in the
Xml based data exchange in the
 
Ontology-based Data Integration
Ontology-based Data IntegrationOntology-based Data Integration
Ontology-based Data Integration
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
Ontology
OntologyOntology
Ontology
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic web
 
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
 
Use of ontologies in natural language processing
Use of ontologies in natural language processingUse of ontologies in natural language processing
Use of ontologies in natural language processing
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04
 
Aggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperAggregating Semantic Annotators Paper
Aggregating Semantic Annotators Paper
 
Using ontology for natural language processing
Using ontology for natural language processingUsing ontology for natural language processing
Using ontology for natural language processing
 
Lecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented ApplicationsLecture 2: From Semantics To Semantic-Oriented Applications
Lecture 2: From Semantics To Semantic-Oriented Applications
 
Ontology
Ontology Ontology
Ontology
 

Semelhante a SMalL - Semantic Malware Log Based Reporter

SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalgowthamnaidu0986
 
Lexicon Disambiguation1
Lexicon Disambiguation1Lexicon Disambiguation1
Lexicon Disambiguation1Sead Spuzic
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...IJwest
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...dannyijwest
 
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSA NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSijasuc
 
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSA NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSijwscjournal
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
Building an Ontology in Educational Domain Case Study for the University of P...
Building an Ontology in Educational Domain Case Study for the University of P...Building an Ontology in Educational Domain Case Study for the University of P...
Building an Ontology in Educational Domain Case Study for the University of P...IJRES Journal
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text University of Bari (Italy)
 
Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentProposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentJorge Barreto
 
A Comparative Study of Ontology building Tools in Semantic Web Applications
A Comparative Study of Ontology building Tools in Semantic Web Applications A Comparative Study of Ontology building Tools in Semantic Web Applications
A Comparative Study of Ontology building Tools in Semantic Web Applications dannyijwest
 
A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications IJwest
 
A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications dannyijwest
 
NI Manuscript. finale.pdf
NI Manuscript. finale.pdfNI Manuscript. finale.pdf
NI Manuscript. finale.pdfArceeFebDelaPaz
 

Semelhante a SMalL - Semantic Malware Log Based Reporter (20)

SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professional
 
Ontology
OntologyOntology
Ontology
 
Lexicon Disambiguation1
Lexicon Disambiguation1Lexicon Disambiguation1
Lexicon Disambiguation1
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
 
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSA NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
 
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSA NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Tutorial 1-Ontologies
Tutorial 1-OntologiesTutorial 1-Ontologies
Tutorial 1-Ontologies
 
Building an Ontology in Educational Domain Case Study for the University of P...
Building an Ontology in Educational Domain Case Study for the University of P...Building an Ontology in Educational Domain Case Study for the University of P...
Building an Ontology in Educational Domain Case Study for the University of P...
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentProposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
 
A Comparative Study of Ontology building Tools in Semantic Web Applications
A Comparative Study of Ontology building Tools in Semantic Web Applications A Comparative Study of Ontology building Tools in Semantic Web Applications
A Comparative Study of Ontology building Tools in Semantic Web Applications
 
A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications
 
A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications A Comparative Study Ontology Building Tools for Semantic Web Applications
A Comparative Study Ontology Building Tools for Semantic Web Applications
 
NI Manuscript. finale.pdf
NI Manuscript. finale.pdfNI Manuscript. finale.pdf
NI Manuscript. finale.pdf
 
Cw32611616
Cw32611616Cw32611616
Cw32611616
 

Último

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Último (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

SMalL - Semantic Malware Log Based Reporter

  • 1. SMalL - Semantic Malware Log-based reporter Stefan Ceriu, Stefan Prutianu Faculty of Computer Science, „Al. I. Cuza“ University, Iasi, Romania { stefan.ceriu, stefan.prutianu}@info.uaic.ro Abstract. In this paper we present the SMalL Ontology for malicious software classification, SMalL Java Application for antivirus systems comparison and the SMalL knowledge based file format for malware related attacks. We believe that our ontology is able to aid the development of malware prevention software by offering a common knowledge base and a clear classification of the existing malicious software. The application is a prototype regarding how this ontology might be used in conjunction with known antivirus capabilities to offer a comprehensive comparison. Keywords: malware, semantic web, jena, owl, protégé, ontology, virus, worm, Trojan, spyware, crimeware; 1 Introduction Malware, also known as malicious code and malicious software, refers to a program that is inserted into a system, usually covertly, with the intent of compromising the confidentiality, integrity, or availability of the victim‘s data, applications, or operating system or otherwise annoying or disrupting the victim. Malware has become the most significant external threat to most systems, causing widespread damage and disruption, and necessitating extensive recovery efforts within most organizations. Spyware malware intended to violate a user‘s privacy has also become a major concern to organizations. Although privacy-violating malware has been in use for many years, it has become much more widespread recently, with spyware invading many systems to monitor personal activities and conduct financial fraud. Organizations also face similar threats from a few forms of non-malware threats that are often associated with malware. One of these forms that has become commonplace is phishing, which is using deceptive computer-based means to trick individuals into disclosing sensitive information. Another common form is virus hoaxes, which are false warnings of new malware threats. We will further look into way by witch to classify all the different types of malware by means of a new ontology and an application designed to work with it towards comparing different antivirus systems available.
  • 2. 2 Ontologies and OWL 2.1 Overview The term ontology originates from philosophy. In that context, it is used as the name of a subfield of philosophy, namely, the study of the nature of existence, the branch of metaphysics concerned with identifying, in the most general terms, the kinds of things that actually exist, and how to describe them. For example, the observation that the world is made up of specific objects that can be grouped into abstract classes based on shared properties is a typical ontological commitment. However, in more recent years, ontology has become one of the many words hijacked by computer science and given a specific technical meaning that is rather different from the original one. Instead of ―ontology‖ we now speak of ―an ontology.‖ In general, an ontology describes formally a domain of discourse. Typically, an ontology consists of a finite list of terms and the relationships between these terms. The terms denote important concepts (classes of objects) of the domain. For example, in a university setting, staff members, students, courses, lecture theaters, and disciplines are some important concepts. The relationships typically include hierarchies of classes. A hierarchy specifies a class C to be a subclass of another class S if every object in C is also included in S. For example, all faculty members are staff members. Apart from subclass relationships, ontologies may include information  properties (X teaches Y)  value restrictions (only faculty members may teach courses)  disjointness statements (faculty and general staff are disjoint)  specifications of logical relationships between objects (every department must include at least ten faculty members). In the context of the Web, ontologies provide a shared understanding of a domain. Such a shared understanding is necessary to overcome differences in terminology. One application‘s zip code may be the same as another application‘s area code. Another problem is that two applications may use the same term with different meanings. In university A, a course may refer to a degree (like computer science), while in university B it may mean a single subject (CS 101). Such differences can be overcome by mapping the particular terminology to a shared ontology or by defining direct mappings between the ontologies. In either case, it is easy to see that ontologies support semantic interoperability. Ontologies are useful for the organization and navigation of Web sites. Many web sites today expose on the left-hand side of the page the top levels of a concept hierarchy of terms. The user may click on one of them to expand the subcategories. Also, ontologies are useful for improving the accuracy of Web searches. The search engines can look for pages that refer to a precise concept in an ontology instead of collecting all pages in which certain, generally ambiguous, keywords occur. In this way, differences in terminology between Web pages and the queries can be overcome. In addition, Web searches can exploit generalization/specialization information. If a query fails to find any relevant documents, the search engine may suggest to the user a more general query. It is even conceivable for the engine to run
  • 3. such queries proactively to reduce the reaction time in case the user adopts a suggestion. Or if too many answers are retrieved, the search engine may suggest to the user some specializations. The Web Ontology Working Group of W3C identified a number of characteristic use cases for the Semantic Web that would require much more expressiveness than RDF and RDF Schema offer. A number of research groups in both the United States and Europe had already identified the need for a more powerful ontology modeling language. This led to a joint initiative to define a richer language, called DAML+OIL (the name is a join of the names of the U.S. proposal DAML- ONT and the European language OIL). DAML+OIL in turn was taken as the starting point for the W3C Web Ontology Working Group in defining OWL, the language that is aimed to be the standardized and broadly accepted ontology language of the Semantic Web. Ontology languages allow users to write explicit, formal conceptualizations of domain models. The main requirements are a well-defined syntax, efficient reasoning support, a formal semantics, sufficient expressive power and convenience of expression. The importance of a well-defined syntax is clear and known from the area of programming languages; it is a necessary condition for machine processing of information. All the languages we have presented so far have a well defined syntax. DAML+OIL and OWL build upon RDF and RDFS and have the same kind of syntax. Of course, it is questionable whether the XML-based RDF syntax is very user- friendly; there are alternatives better suited to human users (for example, see the OIL syntax). However, this drawback is not very significant because ultimately users will be developing their own ontologies using authoring tools, or more generally, ontology development tools, instead of writing them directly in DAML+OIL or OWL. A formal semantics describes the meaning of knowledge precisely. Precisely here means that the semantics does not refer to subjective intuitions, nor is it open to different interpretations by different people (or machines). The importance of a formal semantics is well-established in the domain of mathematical logic, for instance. One use of a formal semantics is to allow people to reason about the knowledge. For ontological knowledge, we may reason about the following:  Class membership. If x is an instance of a class C, and C is a subclass of D, then we can infer that x is an instance of D  Equivalence of classes. If class A is equivalent to class B, and class B is equivalent to class C, then A is equivalent to C, too.  Consistency. Suppose we have declared x to be an instance of the class A and that A is a subclass of B ∩ C, A is a subclass of D, and B and D are disjoint. Then we have an inconsistency because A should be empty but has the instance x. This is an indication of an error in the ontology.  Classification. If we have declared that certain property-value pairs are a sufficient condition for membership in a class A, then if an individual x satisfies such conditions, we can conclude that x must be an instance of A. Semantics is a prerequisite for reasoning support. Derivations such as the preceding ones can be made mechanically instead of being made by hand.
  • 4. Reasoning support is important because it allows one to:  check the consistency of the ontology and the knowledge  check for unintended relationships between classes  automatically classify instances in classes Automated reasoning support allows one to check many more cases than could be checked manually. Checks like the preceding ones are valuable for designing large ontologies, where multiple authors are involved, and for integrating and sharing ontologies from various sources. A formal semantics and reasoning support are usually provided by mapping an ontology language to a known logical formalism, and by using automated reasoners that already exist for those formalisms. OWL is (partially) mapped on description logic, and makes use of existing reasoners such as FaCT and RACER. Description logics are a subset of predicate logic for which efficient reasoning support is possible. RDF and RDFS allow the representation of some ontological knowledge. The main modeling primitives of RDF/RDFS concern the organization of vocabularies in typed hierarchies: subclass and sub-property relationships, domain and range restrictions, and instances of classes. However, a number of other features are missing. Here we list a few:  Local scope of properties. rdfs:range defines the range of a property, say eats, for all classes. Thus in RDF Schema we cannot declare range restrictions that apply to some classes only. For example, we cannot say that cows eat only plants, while other animals may eat meat too  Disjointness of classes. Sometimes we wish to say that classes are disjoint. For example, male and female are disjoint. But in RDF Schema we can only state subclass relationships, e.g., female is a subclass of person  Boolean combinations of classes. Sometimes we wish to build new classes by combining other classes using union, intersection, and complement. For example, we may wish to define the class person to be the disjoint union of the classes male and female. RDF Schema does not allow such definitions  Cardinality restrictions. Sometimes we wish to place restrictions on how many distinct values a property may or must take. For example, we would like to say that a person has exactly two parents, or that a course is taught by at least one lecturer. Again, such restrictions are impossible to express in RDF Schema  Special characteristics of properties. Sometimes it is useful to say that a property is transitive (like ―greater than‖), unique (like ―is mother of‖), or the inverse of another property (like ―eats‖ and ―is eaten by‖) Thus we need an ontology language that is richer than RDF Schema, a language that offers these features and more. In designing such a language one should be aware of the trade-off between expressive power and efficient reasoning support. Generally speaking, the richer the language, the more inefficient the reasoning support becomes, often crossing the border of non-computability. Thus we need a compromise, a language that can be supported by reasonably efficient reasoners while being sufficiently expressive to express large classes of ontologies and knowledge.
  • 5. 2.2 Protégé Knowledge about the application domain is one of the most important cornerstones of successful software projects. We must gather at least a basic understanding of the concepts relevant to your customers before we can begin coding. For example, we need to know how your customer's business processes work before we can develop a warehouse management system; we need to know that users who buy cat food might also be interested in cat litter before you can implement purchase recommendations for an online shop. We acquire such knowledge from domain experts and capture it in some kind of domain model. In simple cases, we can scribble these models on paper. This approach works fine for small projects and when the experts help us decipher their handwriting. But it's better to have models that directly translate into a Java program. For instance, we can use Unified Modeling Language (UML) to sketch the domain models with class diagrams and use cases. UML is quite good for quickly getting to an implementation, but it is basically a language for object-oriented programming that few domain experts fully understand. And it consists of a fixed set of modeling constructs (such as classes and attributes) that are not very useful when domain experts would rather talk about specific business processes and products. The Protégé-OWL editor is an extension of Protégé that supports the Web Ontology Language (OWL). OWL is the most recent development in standard ontology languages, endorsed by the World Wide Web Consortium (W3C) to promote the Semantic Web vision. An OWL ontology may include descriptions of classes, properties and their instances. Given such an ontology, the OWL formal semantics specifies how to derive its logical consequences, i.e. facts not literally present in the ontology, but entailed by the semantics. These entailments may be based on a single document or multiple distributed documents that have been combined using defined OWL mechanisms. The Protégé-OWL editor enables users to: • Load and save OWL and RDF ontologies. • Edit and visualize classes, properties, and SWRL rules. • Define logical class characteristics as OWL expressions. • Execute reasoners such as description logic classifiers. • Edit OWL individuals for Semantic Web markup. Protégé-OWL's flexible architecture makes it easy to configure and extend the tool. It is tightly integrated with Jena and has an open-source Java API for the development of custom-tailored user interface components or arbitrary Semantic Web services. From a programmer's perspective, one of Protégé's most attractive features is that it provides an open source API to plug in your own Java components and access the domain models from your application. As a result, you can develop systems very rapidly: just start with the underlying domain model, let Protégé generate the basic user interface, and then gradually write widgets and plug-ins to customize look-and- feel and behavior.
  • 6. Individuals, represent objects in the domain in which we are interested 2. An important difference between Protégé and OWL is that OWL does not use the Unique Name Assumption (UNA). This means that two different names could actually refer to the same individual. For example, ―Queen Elizabeth‖, ―The Queen‖ and ―Elizabeth Windsor‖ might all refer to the same individual. In OWL, it must be explicitly stated that individuals are the same as each other, or different to each other — otherwise they might be the same as each other, or they might be different to each other. Properties are binary relations on individuals - i.e. properties link two individuals together. For example, the property hasSibling might link the individual Matthew to the individual Gemma, or the property hasChild might link the individual Peter to the individual Matthew. Properties can have inverses. For example, the inverse of hasOwner is isOwnedBy. Properties can be limited to having a single value – i.e. to being functional. They can also be either transitive or symmetric. OWL classes are interpreted as sets that contain individuals. They are described using formal (mathematical) descriptions that state precisely the requirements for membership of the class. For example, the class Cat would contain all the individuals that are cats in our domain of interest. Classes may be organised into a superclass-subclass hierarchy, which is also known as a taxonomy. Subclasses specialize (‗are subsumed by‘) their superclasses. For example consider the classes Animal and Cat – Cat might be a subclass of Animal (so Animal is the superclass of Cat). This says that, ‗All cats are animals‘, ‗All members of the class Cat are members of the class Animal‘, ‗Being a Cat implies that you‘re an Animal‘, and ‗Cat is subsumed by Animal‘. One of the key features of OWL-DL is that these superclass- subclass relationships (subsumption relationships) can be computed automatically by a reasoned. In OWL classes are built up of descriptions that specify the conditions that must be satisfied by an individual for it to be a member of the class. OWL Classes are assumed to ‗overlap‘. We therefore cannot assume that an individual is not a member of a particular class simply because it has not been asserted to be a member of that class. In order to ‗separate‘ a group of classes we must make them disjoint from one another. This ensures that an individual who has been asserted to be a member of one of the classes in the group cannot be a member of any other classes in that group. One of the key features of ontologies that are described using OWL-DL is that they can be processed by a reasoner. One of the main services offered by a reasoner is to test whether or not one class is a subclass of another class. By performing such tests on the classes in an ontology it is possible for a reasoner to compute the inferred ontology class hierarchy. Another standard service that is offered by reasoners is consistency checking. Based on the description (conditions) of a class the reasoner can check whether or not it is possible for the class to have any instances. A class is deemed to be inconsistent if it cannot possibly have any instances. Protégé allows different OWL reasoners to be plugged-in; the reasoner shipped with Protégé is called Fact++. The ontology can be ‗sent to the reasoner‘ to automatically compute the classification hierarchy and also to check the logical consistency of the ontology. In Protégé the ‗manually constructed‘ class hierarchy is called the asserted hierarchy. The class hierarchy that is automatically computed by the reasoner is called the inferred hierarchy. Being able to use a reasoner to
  • 7. automatically compute the class hierarchy is one of the major benefits of building an ontology using the OWL-DL sub-language. When constructing very large ontologies (with upwards of several thousand classes in them) the use of a reasoner to compute subclass-superclass relationships between classes becomes almost vital. Without a reasoner it is very difficult to keep large ontologies in a maintainable and logically correct state. In cases where ontologies can have classes that have many superclasses (multiple inheritance) it is nearly always a good idea to construct the class hierarchy as a simple tree. Classes in the asserted hierarchy (manually constructed hierarchy) therefore have no more than one superclass. Computing and maintaining multiple inheritance is the job of the reasoner. This technique helps to keep the ontology in a maintainable and modular state. Not only does this promote the reuse of the ontology by other ontologies and applications, it also minimizes human errors that are inherent in maintaining a multiple inheritance hierarchy. 3 Malware 3.1 Overview Malware, short for malicious software, is software designed to infiltrate a computer system without the owner's informed consent. The expression is a general term used by computer professionals to mean a variety of forms of hostile, intrusive, or annoying software or program code. The term "computer virus" is sometimes used as a catch-all phrase to include all types of malware, including true viruses. Software is considered malware based on the perceived intent of the creator rather than any particular features. Malware includes computer viruses, worms, Trojan horses, most root kits, spyware, dishonest adware, crime ware and other malicious and unwanted software. In law, malware is sometimes known as a computer contaminant, for instance in the legal codes of several U. S. states, including California and West Virginia. Malware is not the same as defective software, that is, software that has a legitimate purpose but contains harmful bugs. Preliminary results from Symantec published in 2008 suggested that “the release rate of malicious code and other unwanted programs may be exceeding that of legitimate software applications”. According to F-Secure, "as much malware [was] produced in 2007 as in the previous 20 years altogether." Malware's most common pathway from criminals to users is through the Internet: primarily by e-mail and the World Wide Web. The prevalence of malware as a vehicle for organized Internet crime, along with the general inability of traditional anti-malware protection platforms to protect against the continuous stream of unique and newly produced professional malware, has seen the adoption of a new mindset for businesses operating on the Internet - the acknowledgment that some sizable percentage of Internet customers will always be infected for some reason or other, and that they need to continue doing business with infected customers. The result is a greater emphasis on back-office systems designed
  • 8. to spot fraudulent activities associated with advanced malware operating on customers' computers. Many early infectious programs, including the first Internet Worm and a number of MS-DOS viruses, were written as experiments or pranks generally intended to be harmless or merely annoying rather than to cause serious damage to computers. In some cases the perpetrator did not realize how much harm their creations could do. Young programmers learning about viruses and the techniques wrote them for the sole purpose that they could or to see how far it could spread. As late as 1999, widespread viruses such as the Melissa virus appear to have been written chiefly as pranks. Hostile intent related to vandalism can be found in programs designed to cause harm or data loss. Many DOS viruses, and the Windows ExploreZip worm, were designed to destroy files on a hard disk, or to corrupt the file system by writing invalid data. Network-borne worms such as the 2001 Code Red worm or the Ramen worm fall into the same category. Designed to vandalize web pages, worms may seem like the online equivalent to graffiti tagging, with the author's alias or affinity group appearing everywhere the worm goes. However, since the rise of widespread broadband Internet access, malicious software has come to be designed for a profit motive, either more or less legal (forced advertising) or criminal. For instance, since 2003, the majority of widespread viruses and worms have been designed to take control of users' computers for black-market exploitation.[citation needed] Infected "zombie computers" are used to send email spam, to host contraband data such as child pornography, or to engage in distributed denial-of-service attacks as a form of extortion. Another strictly for-profit category of malware has emerged in spyware - programs designed to monitor users' web browsing, display unsolicited advertisements, or redirect affiliate marketing revenues to the spyware creator. Spyware programs do not spread like viruses; they are, in general, installed by exploiting security holes or are packaged with user-installed software, such as peer- to-peer applications. The best-known types of malware, viruses and worms, are known for the manner in which they spread, rather than any other particular behavior. The term computer virus is used for a program that has infected some executable software and that causes that software, when run, to spread the virus to other executable software. Viruses may also contain a payload that performs other actions, often malicious. A worm, on the other hand, is a program that actively transmits itself over a network to infect other computers. It too may carry a payload. These definitions lead to the observation that a virus requires user intervention to spread, whereas a worm spreads automatically. Using this distinction, infections transmitted by email or Microsoft Word documents, which rely on the recipient opening a file or email to infect the system, would be classified as viruses rather than worms. Some writers in the trade and popular press appear to misunderstand this distinction, and use the terms interchangeably. For a malicious program to accomplish its goals, it must be able to do so without being shut down, or deleted by the user or administrator of the computer on which it is running. Concealment can also help get the malware installed in the first place. When a malicious program is disguised as something innocuous or desirable,
  • 9. users may be tempted to install it without knowing what it does. This is the technique of the Trojan horse or Trojan. In broad terms, a Trojan horse is any program that invites the user to run it, concealing a harmful or malicious payload. The payload may take effect immediately and can lead to many undesirable effects, such as deleting the user's files or further installing malicious or undesirable software. Trojan horses known as droppers are used to start off a worm outbreak, by injecting the worm into users' local networks. One of the most common ways that spyware is distributed is as a Trojan horse, bundled with a piece of desirable software that the user downloads from the Internet. When the user installs the software, the spyware is installed alongside. Spyware authors who attempt to act in a legal fashion may include an end-user license agreement that states the behavior of the spyware in loose terms, which the users are unlikely to read or understand. Once a malicious program is installed on a system, it is essential that it stay concealed, to avoid detection and disinfection. The same is true when a human attacker breaks into a computer directly. Techniques known as root kits allow this concealment, by modifying the host operating system so that the malware is hidden from the user. Root kits can prevent a malicious process from being visible in the system's list of processes, or keep its files from being read. Originally, a root kit was a set of tools installed by a human attacker on a Unix system where the attacker had gained administrator (root) access. Today, the term is used more generally for concealment routines in a malicious program. Some malicious programs contain routines to defend against removal, not merely to hide themselves, but to repel attempts to remove them. An early example of this behavior is recorded in the Jargon File tale of a pair of programs infesting a Xerox CP-V timesharing system. Each ghost-job would detect the fact that the other had been killed, and would start a new copy of the recently slain program within a few milliseconds. The only way to kill both ghosts was to kill them simultaneously (very difficult) or to deliberately crash the system. Similar techniques are used by some modern malware, wherein the malware starts a number of processes that monitor and restore one another as needed. A backdoor is a method of bypassing normal authentication procedures. Once a system has been compromised (by one of the above methods, or in some other way), one or more backdoors may be installed in order to allow easier access in the future. Backdoors may also be installed prior to malicious software, to allow attackers entry. The idea has often been suggested that computer manufacturers preinstall backdoors on their systems to provide technical support for customers, but this has never been reliably verified. Crackers typically use backdoors to secure remote access to a computer, while attempting to remain hidden from casual inspection. To install backdoors crackers may use Trojan horses, worms, or other methods. During the 1980s and 1990s, it was usually taken for granted that malicious programs were created as a form of vandalism or prank. More recently, the greater share of malware programs have been written with a financial or profit motive in mind. This can be taken as the malware authors' choice to monetize their control over infected systems: to turn that control into a source of revenue.
  • 10. Spyware programs are commercially produced for the purpose of gathering information about computer users, showing them pop-up ads, or altering web-browser behavior for the financial benefit of the spyware creator. For instance, some spyware programs redirect search engine results to paid advertisements. Others, often called "stealware" by the media, overwrite affiliate marketing codes so that revenue is redirected to the spyware creator rather than the intended recipient. Spyware programs are sometimes installed as Trojan horses of one sort or another. They differ in that their creators present themselves openly as businesses, for instance by selling advertising space on the pop-ups created by the malware. Most such programs present the user with an end-user license agreement that purportedly protects the creator from prosecution under computer contaminant laws. However, spyware EULAs have not yet been upheld in court. Another way that financially-motivated malware creators can profit from their infections is to directly use the infected computers to do work for the creator. The infected computers are used as proxies to send out spam messages. A computer left in this state is often known as a zombie computer. The advantage to spammers of using infected computers is they provide anonymity, protecting the spammer from prosecution. Spammers have also used infected PCs to target anti-spam organizations with distributed denial-of-service attacks. In order to coordinate the activity of many infected computers, attackers have used coordinating systems known as botnets. In a botnet, the malware or malbot logs in to an Internet Relay Chat channel or other chat system. The attacker can then give instructions to all the infected systems simultaneously. Botnets can also be used to push upgraded malware to the infected systems, keeping them resistant to antivirus software or other security measures. It is possible for a malware creator to profit by stealing sensitive information from a victim. Some malware programs install a key logger, which intercepts the user's keystrokes when entering a password, credit card number, or other information that may be exploited. This is then transmitted to the malware creator automatically, enabling credit card fraud and other theft. Similarly, malware may copy the CD key or password for online games, allowing the creator to steal accounts or virtual items. Another way of stealing money from the infected PC owner is to take control of a dial-up modem and dial an expensive toll call. Dialer (or porn dialer) software dials up a premium-rate telephone number such as a U.S. "900 number" and leave the line open, charging the toll to the infected user. Data-stealing malware is a web threat that divests victims of personal and proprietary information with the intent of monetizing stolen data through direct use or underground distribution. Content security threats that fall under this umbrella include keyloggers, screen scrapers, spyware, adware, backdoors, and bots. The term does not refer to activities such as spam, phishing, DNS poisoning, SEO abuse, etc. However, when these threats result in file download or direct installation, as most hybrid attacks do, files that act as agents to proxy information will fall into the data-stealing malware category.
  • 11. 3.2 SMalL Ontology The SMalL Ontology is designed to aid the development of malware prevention software by offering a common knowledge base and a clear classification of the existing malicious software. It covers all the different categories and subcategories of malware and organized based on behavior, propagation methods, payload, motivation etc. The ontology is divided into five main categories based on the major malicious software threats: Crimeware, Spyware, Trojans, Viruses and Worms. A virus replicates by attaching its program instructions to an ordinary ―host‖ program or document, so that the virus instructions are executed when the host program is executed. There are five main virus categories:  File virus - uses the file system of a given OS (or more than one) to propagate. File viruses include viruses that infect executable files, companion viruses that create duplicates of files, viruses that copy themselves into various directories, and link viruses that exploit file system features.  Boot sector virus - infects the boot sector or the master boot record, or displaces the active boot sector, of a hard drive. Once the hard drive is booted up, boot sector viruses load themselves into the computer‘s memory. Many boot sector viruses, once executed, prevent the O S from booting. Boot sector viruses were widespread in the 1990s, but have almost disappeared since the introduction of 32-bit processors and the near-disappearance of floppy disks as a storage medium for executables.  Macro virus - written in the macro scripting languages of word processing, accounting, editing, or project applications, it propagates by exploiting the macro language‘s properties in order to transfer itself from the infected file containing the macro script to another file. The most widespread macro viruses are for Microsoft Office applications (Word, Excel, PowerPoint, Access). Because they are written in the code of application software, macro viruses are platform independent and can spread between Mac, Windows, Linux, and any other system running the targeted application.  Email virus - refers to the delivery mechanism rather than the infection target or behavior. Email can be used to transmit any of the above types of virus by copying and emailing itself to every address in the victim‘s email address book, usually within an email attachment. Each time a recipient opens the infected attachment, the virus harvests that victim‘s email address book and repeats its propagation process.  Multi-variant virus - the same core virus but implemented with slight variations, so that an anti-virus scanner that can detect one variant will not be able to detect the other variants. Worms are Self-propagating program that spreads over a network, usually the Internet. Unlike viruses, may not depend on other programs or victim actions (such as opening an infected email attachment or clicking on the Web link for a malware Web site) for replication, dissemination, or execution. Worms spread by locating other vulnerable potential hosts on the network (e.g., via scanning or
  • 12. topological analysis), then copying their program instructions to those hosts. There are five main categories of computer worms:  Email worm - spreads via infected email attachments  Instant messaging worm - Spread via infected attachments to IM messages or reader access to Uniform Resource Locators (URL) in IM messages that point to malicious Web sites from which the worm is downloaded.  IRC Worm - Comparable to IM worms, but exploit IRC rather than IM channels.  P2P Worm - Copies itself into a shared folder, then uses P2P mechanisms to announce its existence in hopes that other P2P users will download and execute it.  Web Worm - Spread via user access to a Web page, File Transfer Protocol (FTP) site, or other Internet resources. A Trojan Horse is a destructive program that masquerades as a benign program. Stealthware such as spyware, rootkits, keyloggers, trapdoors, and certain adware represents a subset of Trojans that is intentionally designed to be hard-to detect or undetectable Trojan horse software installs itself on the victim‘s computer when the victim opens an email attachment or computer file containing the Trojan, or clicks on a Web link that directs the victim‘s browser to a Web site from which the Trojan is automatically downloaded. Once installed, the software can be controlled remotely by hackers for criminal or other malicious purposes, such as extracting money, passwords, or other sensitive information, or to create a zombie from which to disseminate spam, phishing emails, the same Trojan, or other malware to other computers on the network/Internet. Trojan horses are classified in six categories:  Backdoor Trojan (also known as Trapdoor Trojan or Remote-Access Trojan) acts as a remote administration utility that enables control of the infected machine by a remote host.  Data-collecting Trojan - surreptitiously collects and sends back information from the victim‘s machine. The surreptitious nature of such software has led to it being referred to as ―stealth ware.‖  Downloader or Dropper - downloads, installs, and in the case of the Downloader, launches additional malware on the victim‘s machine.  Proxy Trojan - turns the victim‘s computer into a proxy server (i.e., a zombie) that operates on behalf of the remote attacker. If the attacker‘s activities are detected and tracked, the trail leads back to the victim rather than to the attacker.  Rootkit - a collection of programs used by a hacker to evade detection while trying to gain unauthorized access to the victim‘s computer. Rootkits are designed to hide processes, files or Windows Registry entries. Rootkits are used by hackers to hide their tracks or to insert threats surreptitiously on compromised computers. Various types of malware use rootkits to hide themselves on a computer  Bot - any type of malware (e.g., Trojan, worm, spyware bots or spybots) that enables the attacker to surreptitiously gain complete control of the infected machine. A computer that has been infected by a bot is referred to as a
  • 13. zombie or, sometimes, a drone. Bots may be further subcategorized according to their delivery mechanism. For example, a Spam bot is similar to an email virus or mass-mailing worm in that it relies on the intended victim‘s action to activate it, either by opening an attachment affixed to a spam email, or by clicking on a Web link within a spam email which points to a Web site from which the bot is downloaded to the victim‘s computer Spyware represents non-Trojan stealthware that has the same objectives and performs the same types of actions as spyware Trojans. A number of bots have spyware capabilities, and are referred to as spybots. They are categorized in 2 main categories:  Adware- Software that automatically displays advertising material to the user, resulting in an unpleasant user experience. If malicious, adware usually exhibits the behaviors and/or infection techniques used by viruses, worms, and/or spyware.  Tracking cookie - a cookie is a data structure that stores information about a user‘s browser session state. While cookies are a necessary component of how many Web sites operate, tracking cookies are specifically designed to track a user‘s behavior across multiple sites. Spyware sites routinely use tracking cookies to monitor a user‘s browsing behavior and associate it with the user‘s personal data such as name, credit card number, and other private information, which can then be harvested and sold to illicit marketers or cybercriminals. Crimeware is malware used in aid of criminal activities. This said, there are specific types of malware used predominantly or exclusively as crimeware. Four main crimeware are known:  Email redirector - used to intercept and relay outgoing emails to the attacker‘s system.  IM redirector - used to intercept and relay outgoing instant messages to the attacker‘s system.  Clicker - redirects the victim to a Web site or Internet resource by sending the necessary commands to the victim‘s browser or replacing the system file(s) in which standard Internet URLs are stored (e.g., the Microsoft Windows hosts file).  Transaction generator- targets not the end-user computer but the computer of a corporate or financial institution‘s computer center. The software generates fraudulent transactions on behalf of the attacker within the victim organization‘s payment processing or other financial systems. In some instances, transaction generators are used to intercept credit card data for abuse by the attacker.  Session hijacker - usually a malicious browser component that, after the victim logs in or begins a browser session, takes over that session to enable a hacker to exploit it, usually to perform criminal actions, such as transferring money from the victim‘s bank account.
  • 14. Figure 1. SMalL Ontology
  • 15. 3.3 SMalL Java Application The SMalL Java Application is a tool designed to compare available software security systems. It works in conjunction with the SMalL ontology to provide better ways by which users can examine similarities and differences between antivirus solutions. The application allows the user to add a new antivirus to the ontology and link its properties to the available malware knowledgebase. The user can afterwards compare the security systems and see exactly which one prevents against a given type of malware and which one doesn’t, on which operating system they run .etc. The application main windows are presented in Figure2.1, Figure 2.2 and Figure 2.3 3.3 SMalL File Format We believe that the file format for malware related attacks can be an OWL file created by extracting data relevant to the given attack directly from the SMalL Ontology. For example in the case of an adware attack the file could contain the antivirus used, the operating system it runs on and that the system might also be infected with a Trojan. If this is the case and the antivirus didn’t manage to find the Trojan then supplementary scans are required to find the problem. In the case a system is infected by multiple malware programs then a custom file can be created and the problems related so that on other occasions the antivirus can check for all of them when one appears. 3.3 Conclusions We created an ontology for malicious software classification which is able to aid the development of malware prevention software by offering a common knowledge base and a clear classification of the existing security issues. We presented an application prototype which handles antivirus software comparison based on the information available in the ontology and user entered data. We also proposed The SMalL file format which is a comprehensive way to report software security issues and brings new possibilities regarding scanning for software security problems.
  • 16. Figure 2.1 Main application window
  • 17. Figure 2.2 Add new antivirus window
  • 18. Figure 2.3 Antivirus comparison window
  • 19. References 1. Yu, Liang: Introduction to the Semantic Web and Semantic Web Services 2. Robert, Colomb: Ontology and the Semantic Web 3. Matthew, Horridge: A Practical Guide To Building OWL Ontologies Using Protégé 4 and CO-ODE Tools 4. Nicholas, Weaver, Vern, Paxson, Stuat, Staniford, Robert, Cunningham: A Taxonomy of Computer Worms 5. Information Assurance Tools Report: Malware 6. AntiVirus Software Review: http://anti-virus-software-review.toptenreviews.com/ 7. Protégé documentation: http://protege.stanford.edu/doc/users.html 8. Joanna, Rutkowska: Introducing Stealth Malware Taxonomy 9. Peter, Mell, Karen, Kent, Joseph, Nusbaum: Introducing Stealth Malware Taxonomy 10. Peter, Gutmann: The commercial malware industry 11. Grigoris, Antoniou, Frank, van Harmelen: Web Ontology Language: OWL 12. Jena documentation: http://jena.sourceforge.net/documentation.html