在Python中使用UMap进行降维的注意点
Python的UMap库(umap-learn)提供了一种特征降维的新方法,但是将其与SkLearn(scikit-learn)等机器学习库结合使用时,可能出现数据类型问题,例如下面的代码:
import numpy as np import sklearn import umap # Generate features arrFeatures = GenerateFeatures() # Run dimensionality reduction with UMap umpDimReductionFitter = umap.umap_.UMAP(n_components=nComponentCount) umpDimReductionFitter.fit(arrFeatures) arrFeaturesTransformed = umpDimReductionFitter.transform(arrFeatures) # Other preprocessing arrFeaturesTransformed = np.array(PreprocessData(arrFeaturesTransformed)) # Clustering with K-Means # Fitting clsCluster = sklearn.cluster.KMeans(n_clusters=nClusterCount) clsCluster.fit(arrFeaturesTransformed) # Clustering arrFeaturesForPredicting = GenerateFeaturesForPredicting() arrFeaturesForPredictingTransformed = umpDimReductionFitter.transform(arrFeaturesForPredicting) arrFeaturesForPredictingTransformed = np.array(PreprocessData(arrFeaturesForPredictingTransformed)) arrResults = clsCluster.predict(arrFeaturesForPredictingTransformed)
此时,可能在执行clsCluster.predict时出现下面的错误:
ValueError: Buffer dtype mismatch, expected 'const float' but got 'double'这表明先前执行的数据存在数据类型问题,将umap.umap_.UMAP替换为sklearn.decomposition.PCA则可以正常工作。因此在调用np.array()后使用下面的代码检查arrFeaturesTransformed的数据类型:
print(arrFeaturesTransformed.dtype.name)
发现输出为float64,故修改两处np.array()调用为:
arrFeaturesTransformed = np.array(PreprocessData(arrFeaturesTransformed), dtype=np.float64) # ... arrFeaturesForPredictingTransformed = np.array(PreprocessData(arrFeaturesForPredictingTransformed), dtype=np.float64)
此时脚本运行正常。
参考资料:
页面版本: 1, 最后编辑于: 24 Nov 2025 09:33





