資源簡(jiǎn)介
python實(shí)現(xiàn)K進(jìn)鄰算法對(duì)手寫數(shù)字識(shí)別的一個(gè)小Demo,含代碼和數(shù)據(jù)集。適合剛?cè)腴T的小白。棒棒棒棒棒棒棒棒棒棒棒棒

代碼片段和文件信息
#?encoding:utf-8
import?numpy?as?np
import?os
import?operator
#?訓(xùn)練數(shù)據(jù)集文件路徑
Dir?=?‘C:\Users\zqwke\Desktop\dataset\digits\\trainingDigits\\‘
#?測(cè)試數(shù)據(jù)集文件路徑
Dir_test?=?‘C:\Users\zqwke\Desktop\dataset\digits\\testDigits\\‘
‘‘‘
算法思路:
1、首先將(32x32)的數(shù)據(jù)矩陣轉(zhuǎn)化為(1x1024)的格式。
然后把訓(xùn)練數(shù)據(jù)集最后轉(zhuǎn)化為(Mx1024)的一個(gè)矩陣形式,記為A,每一行是一個(gè)手寫數(shù)字
2、同樣把要測(cè)試的單個(gè)手寫數(shù)字轉(zhuǎn)化為(1x1024)格式,然后重復(fù)訓(xùn)練樣本個(gè)數(shù)的次數(shù),得到(Mx1024)的格式,記為B,每一行都一樣,M為訓(xùn)練樣本總數(shù)
3、計(jì)算A與B之間的距離,得到單個(gè)手寫數(shù)字與所有訓(xùn)練集的距離
4、對(duì)距離數(shù)組排序,得到距離最近的K個(gè)點(diǎn)
5、在K個(gè)點(diǎn)中出現(xiàn)頻率最高的數(shù)字記為最后分類結(jié)果
‘‘‘
#?原數(shù)據(jù)矩陣轉(zhuǎn)化方法,由?[32?32]?→?[1?1024]
def?ImageVector(Dir?filename):
????#?創(chuàng)建一個(gè)1x1024的零矩陣
????ImageVec?=?np.zeros(1024)
????fr?=?open(Dir?+?filename)
????#?將原數(shù)據(jù)儲(chǔ)存進(jìn)零矩陣
????for?i?in?range(32):
????????line?=?fr.readline()
????????for?j?in?range(32):
????????????ImageVec[32*i+j]?=?int(line[j])
????return?ImageVec
#?數(shù)據(jù)載入
def?LoadData(Dir):
????List?=?os.listdir(Dir)??#?得到Dir路徑下的所有文件目錄
????NumTraining?=?len(List)??#?計(jì)算共用多少個(gè)文件
????TrainingSet?=?np.zeros((NumTraining?1024))??#?構(gòu)建一個(gè)Mx1024的矩陣?,?M即為文件個(gè)數(shù)
????Labels?=?np.zeros(NumTraining)??#?標(biāo)簽矩陣
????#?將訓(xùn)練集儲(chǔ)存到一個(gè)訓(xùn)練矩陣
????for?i?in?range(NumTraining):
????????FileName?=?List[i]
????????int_x?=?ImageVector(Dir?FileName)
????????for?j?in?range(1024):
????????????TrainingSet[i][j]?=?int_x[j]
????????Labels[i]?=?List[i][0]??#?得到每個(gè)手寫數(shù)據(jù)的對(duì)應(yīng)標(biāo)簽數(shù)組
????return?TrainingSet?Labels
#?計(jì)算距離,此處省略對(duì)差作平方,這一步放在Classify中,距離公式等于distance?=?(∑(trainingSet?-?輸入)**2)**0.5
def?DisCompute(Array):
????Distance?=?0
????for?i?in?range(len(Array)):
????????Distance?+=?Array[i]
????return?Distance**0.5
#??KNN分類
def?Classify(input?DataSet?Label?k):
????DataSize?=?DataSet.shape[0]
????diffMat?=?np.tile(input?(DataSize?1))?-?DataSet
????sqDiffMat?=?diffMat?**?2
????Distance_squence?=?np.zeros(len(sqDiffMat))??#?距離序列
????for?i?in?range(len(sqDiffMat)):
????????distances?=?DisCompute(sqDiffMat[i])
????????Distance_squence[i]?=?distances
????sortedDistance?=?Distance_squence.argsort()??#?將距離按升序排列,argsort得到的是升序排序的索引值,而不是距離
????classcount?=?{}??#?K近鄰字典
????#??把距離最近的前K個(gè)點(diǎn)的標(biāo)簽存進(jìn)字典,Key為標(biāo)簽。value為出現(xiàn)頻數(shù)
????for?i?in?range(k):
????????voteLabel?=?Label[sortedDistance[i]]
????????classcount[voteLabel]?=?classcount.get(voteLabel?0)?+?1
????#??得到排序后的字典,按value值排序,最終返回出現(xiàn)次數(shù)最高的類標(biāo)簽,即為分類結(jié)果
????sortedClassCount?=?sorted(classcount.iteritems()?key=operator.itemgetter(1)?reverse=True)
????return?sortedClassCount[0][0]
def?handWritingTest():
????print?‘......Loading?Training?Date......‘
????trainingset?Label_x?=?LoadData(Dir)
????print?‘......Loading?Testing?Date.......‘
????testingset?Label_y?=?LoadData(Dir_test)
????testing_num?=?len(testingset)
????count?=?0
????print?‘......Testing......‘
????#??開始測(cè)試,識(shí)別正確輸出Right和對(duì)應(yīng)序號(hào),識(shí)別錯(cuò)誤輸出False和對(duì)應(yīng)序號(hào)
????for?i?in?range(testing_num):
????????output?=?Classify(testingset[i]?trainingset?Label_x?5)
????????if?output?==?Label_y[i]:
????????????print?‘Right?%d‘?%?i
????????????count?+=?1
????????else:
????????????print?‘False?%d‘?%?i
????TheRightRate?=?float(count)/testing_num
??
?屬性????????????大小?????日期????時(shí)間???名稱
-----------?---------??----------?-----??----
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_0.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_1.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_10.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_11.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_12.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_13.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_14.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_15.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_16.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_17.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_18.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_19.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_2.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_20.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_21.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_22.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_23.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_24.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_25.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_26.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_27.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_28.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_29.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_3.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_30.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_31.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_32.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_33.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_34.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_35.txt
?????文件????????1088??2010-10-07?21:35??digits\testDigits\0_36.txt
............此處省略2850個(gè)文件信息
評(píng)論
共有 條評(píng)論