2. INTRODUCTION
Most job-scheduling approaches for parallel machines apply
space sharing which
means allocating CPUs/nodes to jobs in a dedicated manner and
sharing the machine
among multiple jobs by allocation on different subsets of nodes. Some
approaches
apply time sharing (or better to say a combination of time and space
sharing), i.e. use
multiple time slices per CPU/node. Job scheduling determines when
and where to execute the job, given a stream of parallel jobs and set
of computing resources. In a standard working model, when a parallel
job arrives to the system, the scheduler tries to allocate required
number of processors for the duration of runtime to the job and, if
available, starts the job immediately. If the requested processors are
currently unavailable, the job is queued and scheduled to start at a
later time. The most common metrics evaluated include system
metrics such as the system utilization, throughput, etc. and users
metrics such as turnaround time, wait time, etc. The typical charging
model is based on the amount of total resources used (resources
$times$ runtime) by any job.
3. Data mining, the extraction of hidden predictive information from
large databases, is a powerful new technology with great potential to
help companies focus on the most important information in their data
warehouses. Data mining tools predict future trends and behaviors,
allowing businesses to make proactive, knowledge-driven decisions.
The automated, prospective analyses offered by data mining move
beyond the analyses of past events provided by retrospective tools
typical of decision support systems. Data mining tools can answer
business questions that traditionally were too time consuming to
resolve. They scour databases for hidden patterns, finding predictive
information that experts may miss because it lies outside their
expectations.
Most companies already collect and refine massive quantities of
data. Data mining techniques can be implemented rapidly on existing
software and hardware platforms to enhance the value of existing
information resources, and can be integrated with new products and
systems as they are brought on-line. When implemented on high
performance client/server or parallel processing computers, data
mining tools can analyze massive databases to deliver answers to
questions such as, "Which clients are most likely to respond to my
next promotional mailing, and why?"
4. Data mining (DM), also called Knowledge-Discovery in
Databases (KDD) or Knowledge-Discovery and Data Mining, is the
process of automatically searching large volumes of data for patterns
using tools such as classification, association rule mining, clustering,
etc.. Data mining is a complex topic and has links with multiple core
fields such as computer science and adds value to rich seminal
computational techniques from statistics, information retrieval,
machine learning and pattern recognition.
Data mining techniques are the result of a long process of research
and product development. This evolution began when business data
was first stored on computers, continued with improvements in data
access, and more recently, generated technologies that allow users to
navigate through their data in real time. Data mining takes this
evolutionary process beyond retrospective data access and navigation
to prospective and proactive information delivery. Data mining is ready
for application in the business community because it is supported by
three technologies that are now sufficiently mature:
o Massive data collection
o Powerful multiprocessor computers
o Data mining algorithms
Commercial databases are growing at unprecedented rates. A recent
META Group survey of data warehouse projects found that 19% of
5. respondents are beyond the 50 gigabyte level, while 59% expect to be
there by second quarter of 1996.1 In some industries, such as retail,
these numbers can be much larger. The accompanying need for
improved computational engines can now be met in a cost-effective
manner with parallel multiprocessor computer technology. Data mining
algorithms embody techniques that have existed for at least 10 years,
but have only recently been implemented as mature, reliable,
understandable tools that consistently outperform older statistical
methods.
6. Overview of the System
There are mainly two types of scheduling namely the system level
scheduling and the application level scheduling. The scheduling system
will analyze the load situation of every node and select one node to
run the job. The scheduling policy is to optimize the total performance
of the whole system. If the system is heavily loaded, the scheduling
system has to realize the load balancing and increase the throughput
and resource utilization under restricted conditions. This kind of
scheduling is known as the system level scheduling.
If multiple jobs arrive within a unit scheduling time slot, the
scheduling system shall allocate an appropriate number of jobs to
every node in order to finish these jobs under a defined objective.
Obviously, the objective is usually the minimal average execution
time. This scheduling policy is application-oriented so we call it
application-level scheduling.
A genetic algorithm (or GA) is a search technique used in computing
to find true or approximate solutions to optimization and search
problems. Genetic algorithms are categorized as global search
heuristics. Genetic algorithms are a particular class of evolutionary
algorithms that use techniques inspired by evolutionary biology such
as inheritance, mutation, selection, and crossover (also called
recombination).
7. Genetic algorithms are implemented as a computer simulation in which
a population of abstract representations (called chromosomes or the
genotype or the genome) of candidate solutions (called individuals,
creatures, or phenotypes) to an optimization problem evolves toward
better solutions. Traditionally, solutions are represented in binary as
strings of 0s and 1s, but other encodings are also possible. The
evolution usually starts from a population of randomly generated
individuals and happens in generations. In each generation, the fitness
of every individual in the population is evaluated, multiple individuals
are stochastically selected from the current population (based on their
fitness), and modified (recombined and possibly mutated) to form a
new population. The new population is then used in the next iteration
of the algorithm. Commonly, the algorithm terminates when either a
maximum number of generations has been produced, or a satisfactory
fitness level has been reached for the population. If the algorithm has
terminated due to a maximum number of generations, a satisfactory
solution may or may not have been reached.
A typical genetic algorithm requires two things to be defined:
1. a genetic representation of the solution domain,
2. a fitness function to evaluate the solution domain.
8. A standard representation of the solution is as an array of bits. Arrays
of other types and structures can be used in essentially the same way.
The main property that makes these genetic representations
convenient is that their parts are easily aligned due to their fixed size,
that facilitates simple crossover operation. Variable length
representations may also be used, but crossover implementation is
more complex in this case. Tree-like representations are explored in
Genetic programming and free-form representations are explored in
HBGA.
The fitness function is defined over the genetic representation and
measures the quality of the represented solution. The fitness function
is always problem dependent. For instance, in the knapsack problem
we want to maximize the total value of objects that we can put in a
knapsack of some fixed capacity. A representation of a solution might
be an array of bits, where each bit represents a different object, and
the value of the bit (0 or 1) represents whether or not the object is in
the knapsack. Not every such representation is valid, as the size of
objects may exceed the capacity of the knapsack. The fitness of the
solution is the sum of values of all objects in the knapsack if the
representation is valid, or 0 otherwise. In some problems, it is hard or
even impossible to define the fitness expression; in these cases,
interactive genetic algorithms are used.
9. Once we have the genetic representation and the fitness function
defined, GA proceeds to initialize a population of solutions randomly,
then improve it through repetitive application of mutation, crossover,
and selection operators.
10. Abstract
Job scheduling is the key feature of any computing environment
and the efficiency of computing depends largely on the scheduling
technique used. Intelligence is the key factor which is lacking in the
job scheduling techniques of today. Genetic algorithms are powerful
search techniques based on the mechanisms of natural selection and
natural genetics.
Multiple jobs are handled by the scheduler and the resource the
job needs are in remote locations. Here we assume that the resource a
job needs are in a location and not split over nodes and each node that
has a resource runs a fixed number of jobs.
The existing algorithms used are non predictive and employs
greedy based algorithms or a variant of it. The efficiency of the job
scheduling process would increase if previous experience and the
genetic algorithms are used.
In this paper, we propose a model of the scheduling algorithm
where the scheduler can learn from previous experiences and an
effective job scheduling is achieved as time progresses.
11. Description of Problem
The similar system is already available are non predictive and employs
greedy based algorithms or a variant of it. That is the existing system
will not predict in advance regarding the situation. So we can not
schedule the jobs in network in such a way that the resources are
utilized at the optimal level. The problem is to reduce the processing
overhead
during
scheduling.
The proposed system work to data transfer between computers of two
networks. generally,during data transfer between pc's of two different
networks.
Existing Method
The Data mining Algorithms can be categorized into the following
:
Association Algorithm
Classification
Clustering Algorithm
Classification:
12. The process of dividing a dataset into mutually exclusive groups
such that the members of each group are as "close" as possible to one
another, and different groups are as "far" as possible from one
another, where distance is measured with respect to specific
variable(s) you are trying to predict. For example, a typical
classification problem is to divide a database of companies into groups
that are as homogeneous as possible with respect to a
creditworthiness variable with values "Good" and "Bad."
Clustering:
The process of dividing a dataset into mutually exclusive groups
such that the members of each group are as "close" as possible to one
another, and different groups are as "far" as possible from one
another, where distance is measured with respect to all available
variables.
Given databases of sufficient size and quality, data mining technology
can generate new business opportunities by providing these
capabilities:
•
Automated prediction of trends and behaviors. Data mining
automates the process of finding predictive information in large
databases. Questions that traditionally required extensive handson analysis can now be answered directly from the data —
13. quickly. A typical example of a predictive problem is targeted
marketing. Data mining uses data on past promotional mailings
to identify the targets most likely to maximize return on
investment in future mailings. Other predictive problems include
forecasting bankruptcy and other forms of default, and
identifying segments of a population likely to respond similarly to
given events.
•
Automated discovery of previously unknown patterns.
Data mining tools sweep through databases and identify
previously hidden patterns in one step. An example of pattern
discovery is the analysis of retail sales data to identify seemingly
unrelated products that are often purchased together. Other
pattern discovery problems include detecting fraudulent credit
card transactions and identifying anomalous data that could
represent data entry keying errors.
14. Proposed System
Job scheduling is the key feature of any computing environment
and the efficiency of computing depends largely on the scheduling
technique used. Popular algorithm called genetic concept is used in the
systems across the network and scheduling the job according to
predicting the load.
Here
the
system
will
take
care
of
the
scheduling of data packets between the source and destination
computers.
•
Job scheduling to route the packets at all the ports in the router
•
Maintaining queue of data packets and scheduling algorithm is
implemented
•
First
Come
First
Serve
scheduling
and
Genetic
algorithm
scheduling is called for source and destination
•
Comparison of two algorithm is shown in this proposed system
15. Hardware specifications:
Processor
RAM
:
:
Intel Processor IV
128 MB
Hard disk
:
20 GB
CD drive
:
40 x Samsung
Floppy drive
:
1.44 MB
Monitor
:
15’ Samtron color
Keyboard
Mouse
:
:
108 mercury keyboard
Logitech mouse
Software Specification
Operating System – Windows XP/2000
Language used – J2sdk1.4.0, JCreator
16. Module Design
Simulated Model :
The simulated model of network is constructed by keeping
group of computer as Network 0 and Network 1. In between the two
network the router is placed from where the data from one network
flows to other network.
First Come First Serve Algorithm:
The packet transfer between the network in implemented
using FCFS algorithm
Genetic Algorithm:
The packet transfer between the network in implemented
using Genetic algorithm. The algorithm details were discussed in
Proposed system design.
Projecting Result and Comparison:
The data transfer between the network of source and
destination is shown by drawing the path between source and
destination. For drawing the path , the points across the network is
also collected. The comparison of two algorithm result are displayed to
the user in separate frame to see the efficiency of Genetic algorithm