xxxx18一60岁hd中国/日韩女同互慰一区二区/西西人体扒开双腿无遮挡/日韩欧美黄色一级片 - 色护士精品影院www

資源簡介

算法思想:提取文檔的TF/IDF權重,然后用余弦定理計算兩個多維向量的距離來計算兩篇文檔的相似度,用標準的k-means算法就可以實現(xiàn)文本聚類。源碼為java實現(xiàn)

資源截圖

代碼片段和文件信息

package?textcluster;

import?java.util.List;



?///?
????///?分詞器接口
????///?

????public?interface?ITokeniser
????{
????????List?partition(String?input);
????}

?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----

?????文件???????1510??2009-05-08?07:30??textcluster\WawaCluster.java

?????文件???????5669??2009-05-08?07:57??textcluster\WawaKMeans.java

?????文件????????204??2009-05-07?11:02??textcluster\ITokeniser.java

?????文件???????1487??2009-05-07?21:58??textcluster\Tokeniser.java

?????文件???????3474??2009-05-08?07:55??textcluster\Program.java

?????文件???????1152??2009-05-07?22:02??textcluster\StopWordsHandler.java

?????文件???????1392??2009-05-07?11:04??textcluster\TermVector.java

?????文件???????6930??2009-05-08?10:27??textcluster\TFIDFMeasure.java

?????文件????????606??2009-05-07?10:45??textcluster\input.txt

?????目錄??????????0??2009-05-08?16:55??textcluster

-----------?---------??----------?-----??----

????????????????22424????????????????????10


評論

共有 條評論