Project report on Data Clustering

Group – 9
Project I (CSE 791)
7th Semester, CSE.
UIT, Burdwan University.

Project Guide:
Mr. Dipankar Dutta
(Associate Professor of CSE & IT Department,
UIT, Burdwan University)

Project Members:
Puja Mukherjee (20081013)
Sunandita Chattopadhyay (20081001)
Rakesh Mukherjee (20081055)
Bibaswann Bandyopadhyay (20081017)
Ankita Ghosh (20081051)

Fields Covered:
Neural Network:
• Self Organizing Maps
• Learning Vector Quantization
Evolutionary Computation:
• Particle Swarm Optimization
• Ant Colony Optimization
• Simulated Annealing

Clustering: Most important unsupervised learning
problem. Grouping of similar kinds of data. We are
doing clustering of continuous data here using Neural
Network and Evolutionary Computation.

Continuous Data Clusters of Data

Self Organizing Map
1. Normalize the input data to fall between -1 and 1
(using Multiplicative normalization)
2. The normalized input data are fed across the input layer of som
3. Convert the output to a bipolar number
4. Learn rate is initially .99.
5. The winning neuron is chosen according to which produced the largest bipolar
value
6. A matrix named correction matrix is used to hold the corrections
7. Error is calculated as follows
For all input neurons:
a)The difference between the training set and the corresponding weight
matrix entry are calculated.
b)the difference is added in the correction matrix for that input neuron
c)the square of the difference is error value
8. The weights are adjusted as subtractive
a)for the winning neuron the input weights are multiplied with the
correction matrix values multiplied by learn rate
9. learning rate is decreased
10. the current error is checked to see if it is the best error so far, if so, the best
error value is changed
11. Repeat 5 to 10 until the error value decreases continuously for last 50 iterations

Learning Vector Quantization
1. Load the input dataset.
2. Set Maximum number of cluster centers = M (no. of ClassLabels)
Set Minimum number of cluster center = 0 (zero)
3. FOR an input pattern P
a. FIND the closest cluster center C from P.
i. IF not found THEN
Allocate P as a new cluster center C.
ii. ELSE
FIND the distance of the cluster center C from P
A. IF (distance > THRESHOLD) THEN
Allocate P as a new cluster center C.
B. ELSE
i. Attach P with the cluster center C.
ii. Calculate new cluster center of cluster C.
b. REPEAT step (a) for all inputs.
4. REPEAT step 3 for 100 iterations.

Particle Swarm Optimization
1. Initialize each particle with k random cluster centroids.
2.For t=1 to t_max do
a. For each particle i do
b. For each data vector z in the dataset
i. Calculate the euclidian distance of
z with all cluster centroids.
ii. Assign z to the cluster that have
nearest centroid to z.
iii. Calculate the fitness function.
c. Update the global best and local best positions.
d. Update the cluster centroids according to velocity
updating and particle position updating formulas of PSO.

Ant Colony Optimization
1. Place every item Xi on a random cell of the grid;
2. Place every ant k on a random cell of the grid unoccupied by ants;
3. iteration_count  1;
4. while iteration_count < maximum_iteration do
5. for i = 1 to no_of_ants do
6. if unladen ant and cell occupied by item Xi then
7. compute f(xi) and Ppick-up(Xi);
8. else
9. if ant carrying item xi and cell empty then
10. compute f(Xi) and Pdrop(Xi);
11. drop item Xi with probability Pdrop(Xi);
12. end if
13. end if
14. move to a randomly selected neighboring and unoccupied cell;
15. end for
16. t  t+1
17. end while
18. print locations of items

Simulated Annealing
Co-ordinator node algorithm:
1. Distribute the n random initial solutions to the n nodes and wait.
2. Upon receiving the first converged result from any of the nodes
stop simulated annealing on other nodes.

Worker node algorithm:
1. Accept initial solutions from the co-ordinator.
2. repeat
2.1. Execute Simulated annealing for p iterations. Exchange
partial results among the worker nodes. Accept the best partial
result.
2.2. p = p - r* (loop iteration number).
until (p <= 0).
3. Execute simulated annealing using the best solution found as the
initial solution.
4. Send the converged value to the coordinator

Project report on Data Clustering

Project report on Data Clustering

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Project report on Data Clustering

Similar to Project report on Data Clustering (20)

Recently uploaded

Recently uploaded (20)

Project report on Data Clustering