assignment/~$near regression.docx assignment/data sets/Description of UCI Datasets.docx Description of UCI Datasets The files in the UCI datasets directory contain training files and test files for three datasets. Both the training file and the test file are text files, containing data in tabular format. Each value is a number, and values are separated by white space. The i-th row and j-th column contain the value for the j-th dimension of the i-th object. The only exception is the LAST column, that stores the class label for each object. Make sure you do not use data from the last column (i.e., the class labels) as parts of the input vector. The datasets are copied from the UCI repository of machine learning datasets. Here are some details on each dataset: · The pendigits dataset. This dataset contains data for pen-based recognition of handwritten digits. · 7494 training objects. · 3498 test objets. · 16 dimensions. · 10 classes. · The satellite dataset. The full name of this dataset is Statlog (Landsat Satellite) Data Set, and it contains data for classification of pixels in satellite images. · 4435 training objects. · 2000 test objets. · 36 dimensions. · 6 classes. · The yeast dataset. This dataset contains some biological data · 1000 training objects. · 484 test objets. · 8 dimensions. · 10 classes. For each dataset, a training file and a test file are provided. The name of each file indicates what dataset the file belongs to, and whether the file contains training or test data. Note that, for the purposes of your assignments, it does not matter at all where the data come from. The methods that you are asked to implement should work on all three datasets, as well as ANY other datasets following the same format. assignment/data sets/pendigits_test.txt 88 92 2 99 16 66 94 37 70 0 0 24 42 65 100 100 8 80 100 18 98 60 66 100 29 42 0 0 23 42 61 56 98 8 0 94 9 57 20 19 7 0 20 36 70 68 100 100 18 92 8 95 82 71 100 27 77 77 73 100 80 93 42 56 13 0 0 9 68 100 6 88 47 75 87 82 85 56 100 29 75 6 0 0 9 70 100 100 97 70 81 45 65 30 49 20 33 0 16 0 0 1 40 100 0 81 15 58 100 57 47 87 50 88 40 42 36 0 4 3 71 0 95 45 100 100 99 79 78 48 53 31 24 54 0 7 79 87 98 81 71 100 72 73 100 66 91 21 48 0 0 13 9 92 95 30 100 34 68 87 89 84 78 100 35 64 0 0 19 9 58 64 100 96 27 100 0 63 79 65 91 72 48 36 10 0 9 34 89 3 70 1 25 49 0 100 23 100 67 56 99 0 100 0 0 90 46 100 88 92 79 69 60 48 39 27 47 6 100 0 2 20 71 0 29 31 0 78 12 100 51 84 93 37 100 8 66 0 100 100 67 98 41 80 44 50 78 42 68 16 35 2 0 0 5 91 69 48 57 9 79 60 100 100 75 95 40 64 8 0 0 9 30 74 55 100 89 87 66 56 100 38 92 8 41 0 0 20 3 5 65 0 89 37 100 88 97 10 ...
assignment/~$near regression.docx assignment/data sets/Description of UCI Datasets.docx Description of UCI Datasets The files in the UCI datasets directory contain training files and test files for three datasets. Both the training file and the test file are text files, containing data in tabular format. Each value is a number, and values are separated by white space. The i-th row and j-th column contain the value for the j-th dimension of the i-th object. The only exception is the LAST column, that stores the class label for each object. Make sure you do not use data from the last column (i.e., the class labels) as parts of the input vector. The datasets are copied from the UCI repository of machine learning datasets. Here are some details on each dataset: · The pendigits dataset. This dataset contains data for pen-based recognition of handwritten digits. · 7494 training objects. · 3498 test objets. · 16 dimensions. · 10 classes. · The satellite dataset. The full name of this dataset is Statlog (Landsat Satellite) Data Set, and it contains data for classification of pixels in satellite images. · 4435 training objects. · 2000 test objets. · 36 dimensions. · 6 classes. · The yeast dataset. This dataset contains some biological data · 1000 training objects. · 484 test objets. · 8 dimensions. · 10 classes. For each dataset, a training file and a test file are provided. The name of each file indicates what dataset the file belongs to, and whether the file contains training or test data. Note that, for the purposes of your assignments, it does not matter at all where the data come from. The methods that you are asked to implement should work on all three datasets, as well as ANY other datasets following the same format. assignment/data sets/pendigits_test.txt 88 92 2 99 16 66 94 37 70 0 0 24 42 65 100 100 8 80 100 18 98 60 66 100 29 42 0 0 23 42 61 56 98 8 0 94 9 57 20 19 7 0 20 36 70 68 100 100 18 92 8 95 82 71 100 27 77 77 73 100 80 93 42 56 13 0 0 9 68 100 6 88 47 75 87 82 85 56 100 29 75 6 0 0 9 70 100 100 97 70 81 45 65 30 49 20 33 0 16 0 0 1 40 100 0 81 15 58 100 57 47 87 50 88 40 42 36 0 4 3 71 0 95 45 100 100 99 79 78 48 53 31 24 54 0 7 79 87 98 81 71 100 72 73 100 66 91 21 48 0 0 13 9 92 95 30 100 34 68 87 89 84 78 100 35 64 0 0 19 9 58 64 100 96 27 100 0 63 79 65 91 72 48 36 10 0 9 34 89 3 70 1 25 49 0 100 23 100 67 56 99 0 100 0 0 90 46 100 88 92 79 69 60 48 39 27 47 6 100 0 2 20 71 0 29 31 0 78 12 100 51 84 93 37 100 8 66 0 100 100 67 98 41 80 44 50 78 42 68 16 35 2 0 0 5 91 69 48 57 9 79 60 100 100 75 95 40 64 8 0 0 9 30 74 55 100 89 87 66 56 100 38 92 8 41 0 0 20 3 5 65 0 89 37 100 88 97 10 ...