欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 汽车 > 维修 > 决策树基本 CART Python手写实现

决策树基本 CART Python手写实现

2025/2/5 17:39:55 来源:https://blog.csdn.net/bo_hai/article/details/143773201  浏览:    关键词:决策树基本 CART Python手写实现

参考资料:
https://blog.csdn.net/weixin_45666566/article/details/107954454
https://blog.csdn.net/Elenstone/article/details/105328111

代码如下:
#-*- coding:utf-8 -*-
import numpy as np
import pandas as pd
import operatordef loadDataSet():csv = pd.read_csv(filepath_or_buffer=r'D:/PythonData/决策树.csv')dataSet = np.array(csv)labels = np.array(csv.columns)[:4]targets = sorted(np.unique(dataSet[:,-1:].flatten()), reverse=True)return dataSet, labels, targetsdef calcProbabilityEnt(dataSet, targets):numEntries = len(dataSet)  # 数据条数feaCounts = 0fea1 = targets[0]for featVec in dataSet:if featVec[-1] == fea1:feaCounts +=1probabilityEnt = float(feaCounts) / numEntriesreturn probabilityEnt    def splitDataSet(dataSet, index, value):retDataSet = []noRetDataSet = []for featVec in dataSet:if featVec[index]  == value:retDataSet.append(np.concatenate((featVec[:index],featVec[index+1:])))if featVec[index]  != value:noRetDataSet.append(np.concatenate((featVec[:index],featVec[index+1:])))return retDataSet,noRetDataSetdef chooseBestFeatureToSplit(dataSet, targets):numFeatures = len(dataSet[0]) - 1if numFeatures == 1:return 0bestGini = 1bestFeatureIndex = -1for i in range(numFeatures):# 每一列中的唯一值集合uniqueVals = set(example[i] for example in dataSet)feaGini = 0for value in uniqueVals:subDataSet,noSubDataSet = splitDataSet(dataSet=dataSet, index=i,value=value)prod = len(subDataSet) / float(len(dataSet))noPord = len(noSubDataSet) / float(len(dataSet))probabilityEnt = calcProbabilityEnt(subDataSet, targets)noProbabilityEnt = calcProbabilityEnt(noSubDataSet,targets)feaGini = round(prod * 2 * probabilityEnt * (1 - probabilityEnt) +  (noPord * (2 * noProbabilityEnt * (1 - noProbabilityEnt))),2)if bestGini > feaGini:bestGini = feaGinibestFeatureIndex = ireturn bestFeatureIndexdef majorityCnt(classList):classCount = {}for vote in classList:try:classCount[vote] += 1except KeyError:classCount[vote] = 1sortedClassCount = sorted(iterable=classCount.items(),key=operator.itemgetter(1),reverse=True)return sortedClassCount[0][0]def createTree(dataSet, labels,targets):classList = [example[-1]  for example in dataSet]if classList.count(classList[0]) == len(classList):return classList[0]if len(dataSet[0]) == 1:return majorityCnt(classList=classList)bestFeatIndex  = chooseBestFeatureToSplit(dataSet=dataSet,targets=targets)bestFeatLabel = labels[bestFeatIndex]np.delete(labels,bestFeatIndex)uniqueVals = set(example[bestFeatIndex] for example in dataSet) # 选出最优特征对应属性的唯一值myTree = {bestFeatLabel:{}} # 分类结果以字典形式保存for value in uniqueVals:subLabels = labels[:] # 深拷贝,拷贝后的值与原值无关(普通复制为浅拷贝,对原值或拷贝后的值的改变互相影响)subDataSet,noSubDataSet = splitDataSet(dataSet,bestFeatIndex,value)myTree[bestFeatLabel][value] = createTree(subDataSet,subLabels,targets) # 递归调用创建决策树return myTreeif __name__=='__main__':dataSet,labels,targets = loadDataSet()print(createTree(dataSet,labels,targets))
运行如果如下:
PS D:\PythonWorkSpace> & E:/anaconda3/python.exe d:/PythonWorkSpace/DecisionTreeDemo.py
{'有自己的房子': {'否': {'有工作': {'否': '不同意', '是': '同意'}}, '是': '同意'}}

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com