A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Scott Cost; Steven Salzberg

doi:10.1023/A:1022664626993

A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Scott Cost, Steven Salzberg

Research output: Contribution to journal › Article › peer-review

450 Scopus citations

Abstract

In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of the feature space is required. We introduce a nearest neighbor algorithm for learning in domains with symbolic features. Our algorithm calculates distance tables that allow it to produce real-valued distances between instances, and attaches weights to the instances to further modify the structure of feature space. We show that this technique produces excellent classification accuracy on three problems that have been studied by machine learning researchers: predicting protein secondary structure, identifying DNA promoter sequences, and pronouncing English text. Direct experimental comparisons with the other learning algorithms show that our nearest neighbor algorithm is comparable or superior in all three domains. In addition, our algorithm has advantages in training speed, simplicity, and perspicuity. We conclude that experimental evidence favors the use and continued development of nearest neighbor algorithms for domains such as the ones studied here.

Original language	English (US)
Pages (from-to)	57-78
Number of pages	22
Journal	Machine Learning
Volume	10
Issue number	1
DOIs	https://doi.org/10.1023/A:1022664626993
State	Published - Jan 1993
Externally published	Yes

Keywords

Nearest neighbor
exemplar-based learning
instance-based learning
protein structure
text pronunciation

ASJC Scopus subject areas

Software
Artificial Intelligence

Access to Document

10.1023/A:1022664626993

Cite this

@article{b47b08451dfd4c029ae78d55d27da65d,

title = "A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features",

abstract = "In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of the feature space is required. We introduce a nearest neighbor algorithm for learning in domains with symbolic features. Our algorithm calculates distance tables that allow it to produce real-valued distances between instances, and attaches weights to the instances to further modify the structure of feature space. We show that this technique produces excellent classification accuracy on three problems that have been studied by machine learning researchers: predicting protein secondary structure, identifying DNA promoter sequences, and pronouncing English text. Direct experimental comparisons with the other learning algorithms show that our nearest neighbor algorithm is comparable or superior in all three domains. In addition, our algorithm has advantages in training speed, simplicity, and perspicuity. We conclude that experimental evidence favors the use and continued development of nearest neighbor algorithms for domains such as the ones studied here.",

keywords = "Nearest neighbor, exemplar-based learning, instance-based learning, protein structure, text pronunciation",

author = "Scott Cost and Steven Salzberg",

year = "1993",

month = jan,

doi = "10.1023/A:1022664626993",

language = "English (US)",

volume = "10",

pages = "57--78",

journal = "Machine Learning",

issn = "0885-6125",

publisher = "Springer Netherlands",

number = "1",

}

TY - JOUR

T1 - A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

AU - Cost, Scott

AU - Salzberg, Steven

PY - 1993/1

Y1 - 1993/1

N2 - In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of the feature space is required. We introduce a nearest neighbor algorithm for learning in domains with symbolic features. Our algorithm calculates distance tables that allow it to produce real-valued distances between instances, and attaches weights to the instances to further modify the structure of feature space. We show that this technique produces excellent classification accuracy on three problems that have been studied by machine learning researchers: predicting protein secondary structure, identifying DNA promoter sequences, and pronouncing English text. Direct experimental comparisons with the other learning algorithms show that our nearest neighbor algorithm is comparable or superior in all three domains. In addition, our algorithm has advantages in training speed, simplicity, and perspicuity. We conclude that experimental evidence favors the use and continued development of nearest neighbor algorithms for domains such as the ones studied here.

AB - In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of the feature space is required. We introduce a nearest neighbor algorithm for learning in domains with symbolic features. Our algorithm calculates distance tables that allow it to produce real-valued distances between instances, and attaches weights to the instances to further modify the structure of feature space. We show that this technique produces excellent classification accuracy on three problems that have been studied by machine learning researchers: predicting protein secondary structure, identifying DNA promoter sequences, and pronouncing English text. Direct experimental comparisons with the other learning algorithms show that our nearest neighbor algorithm is comparable or superior in all three domains. In addition, our algorithm has advantages in training speed, simplicity, and perspicuity. We conclude that experimental evidence favors the use and continued development of nearest neighbor algorithms for domains such as the ones studied here.

KW - Nearest neighbor

KW - exemplar-based learning

KW - instance-based learning

KW - protein structure

KW - text pronunciation

UR - http://www.scopus.com/inward/record.url?scp=34250080806&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250080806&partnerID=8YFLogxK

U2 - 10.1023/A:1022664626993

DO - 10.1023/A:1022664626993

M3 - Article

AN - SCOPUS:34250080806

SN - 0885-6125

VL - 10

SP - 57

EP - 78

JO - Machine Learning

JF - Machine Learning

IS - 1

ER -

A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this