建立菲律宾台风损害模型
编辑建立菲律宾台风损害模型-2023/7/27
客观性
使用过去 20 年 12 场台风的数据,介绍了菲律宾台风造成的建筑物损坏统计模型的开发过程。
构建损坏模型本质是为了预测全国每个城市受损房屋的比例,使用一组与灾害相关的指标(例如风速)和社会经济指标(例如贫困发生率或墙体建筑质量)作为预测因子。
建立损坏模型:模型用于预测城市中受损房屋的比例。
数据来源
损坏模型是根据过去几十年台风造成的建筑物损坏的观测结果建立的(510 Global)。该数据集包含 1,638 个观测值,对应全国 1,034 个城市以及 12 个台风的影响。它有 34 列,对应于台风过后调查的数据以及每个城市对应的其他现场和社会经济指标。
损坏指标:台风造成的损坏以每个城市部分或完全损坏(分别为 PD[Partly damaged (abs.)] 或 CD[Completely damaged (abs.)])的房屋数量来衡量。这些指标也以每个城市房屋总数的百分比形式给出(分别为 pPD[Partly damaged (rel.)] 和 pCD[Completely damaged (rel.)])。其他列对应于根据这两个基本测量计算出的不同指标。
灾害指标:与台风事件相关的变量。
Wind speed -- 风速[公里/小时]
Distance to typhoon -- 到台风的距离[公里]:台风轨迹到城市质心的最短距离。
rainfall -- 降雨量 [毫米]
distance_first_impact -- 距首次影响的距离 [km]:台风登陆点到城市质心的距离。
Experience -- 经历[-]:本次台风之前,该市所属地区平均抵御台风的次数。
场地和暴露指标:与场地地形特征和市社会经济指标相关的变量。这些仅取决于市政府,而不取决于台风(假设它们在时间上相对恒定)。
ruggedness_stdev -- 崎岖度标准[-]:市内地形崎岖度的标准偏差。
Ruggedness -- 崎岖度 [-]:市内平均地形崎岖度。
Elevation -- 海拔 [m]:市内的平均海拔。
Slope -- 坡度[度]:市内的平均地形坡度。
slope_stdev -- 坡度标准[-]:市内地形坡度的标准偏差。
Population -- 人口数量
Population density -- 人口密度[居住区/平方公里]
land_area -- 土地面积
Poverty incidence -- 贫困发生率
% skilled Agriculture/Forestry/Fishermen -- 熟练农业/林业/渔民比例
% strong roof type -- 强屋面类型比例
% strong wall type -- 强墙型比例
数据预览
# 导入原始数据
data = pd.read_csv(r'data/All.csv')
# 查看台风名字及对应数量
data[['disaster_name']].value_counts()
disaster_name
Haima 301
Rammasun 294
Koppu 203
Bopha 175
Haiyan 162
Melor 105
Kalmaegi 92
Nock-Ten 82
Hagupit 81
Sarika 57
Utor 54
Goni 32
dtype: int64
# 输出摘要统计信息
data.describe()
Completely damaged (abs.) Partly damaged (abs.) \
count 1638.000000 1638.000000
mean 582.935897 1138.982295
std 1553.413663 2370.653848
min 1.000000 1.000000
25% 1.000000 14.000000
50% 16.000000 179.500000
75% 269.500000 1411.500000
max 14132.000000 46553.000000
Total damaged houses (abs.) total_damage_houses_0p25weight \
count 1638.000000 1638.000000
mean 1721.918193 867.681471
std 3496.164509 1952.399073
min 2.000000 1.250000
25% 19.000000 7.000000
50% 225.000000 75.500000
75% 1962.250000 694.562500
max 58823.000000 23908.250000
Partly damaged (rel.) Completely damaged (rel.) \
count 1638.000000 1638.000000
mean 11.860162 6.103325
std 17.152123 14.385213
min 0.000972 0.001168
25% 0.197830 0.025812
50% 2.587918 0.223214
75% 18.698752 3.026831
max 115.343099 95.607311
Total damaged houses (rel.) total_damage_houses_0p25weight_perc \
count 1638.000000 1638.000000
mean 17.963487 9.068365
std 26.756541 16.703810
min 0.002647 0.001654
25% 0.277203 0.105430
50% 3.259474 1.104780
75% 27.268523 9.262609
max 129.974141 98.544514
ratio_comp_part Total # of houses Wind speed Distance to typhoon \
count 1638.000000 1638.000000 1638.000000 1638.000000
mean 0.974340 11767.550218 84.682571 74.899818
std 3.676039 15013.119760 35.797212 70.071947
min 0.011246 324.250000 40.000000 0.122506
25% 0.242304 4700.749997 59.999527 24.399295
50% 0.579407 8017.625004 80.000000 50.303957
75% 0.972011 13664.437605 102.693927 104.957121
max 88.201339 194096.500200 298.464302 347.292212
rainfall distance_first_impact Slope Elevation \
count 1638.000000 1638.000000 1638.000000 1638.000000
mean 323.920596 197.212283 8.186324 258.570181
std 155.502152 118.607868 5.716850 321.773277
min 45.000000 4.357767 0.456908 2.516287
25% 244.900000 110.800300 3.935659 59.706514
50% 303.507123 175.939736 7.164828 138.017489
75% 376.500000 263.415087 11.331911 305.599518
max 1964.000000 1223.383698 25.229471 1899.717311
ruggedness_stdev Ruggedness slope_stdev \
count 1638.000000 1638.000000 1638.000000
mean 28.602452 41.832893 6.011024
std 15.179823 28.395457 3.061597
min 1.629265 3.497199 0.286051
25% 18.188352 21.022102 3.935674
50% 29.283881 36.834113 6.375094
75% 39.950044 57.243654 8.481826
max 86.703939 128.965009 13.024484
Predicted damage class (1-5) Population land_area \
count 0.0 1638.000000 1638.000000
mean NaN 47065.645910 215.933543
std NaN 60040.329748 226.523048
min NaN 1291.000000 1.856500
25% NaN 18807.000000 80.480600
50% NaN 32108.000000 146.901100
75% NaN 54658.500000 264.230700
max NaN 776382.000000 2428.147100
Population density Poverty incidence % strong roof type \
count 1638.000000 1638.000000 1638.000000
mean 452.178975 0.257932 0.834480
std 1250.825754 0.150841 0.173800
min 4.498900 0.008500 0.236160
25% 115.491400 0.136300 0.740387
50% 229.109300 0.224950 0.915456
75% 434.300650 0.358525 0.963024
max 34384.322200 0.847600 1.000000
% strong wall type % skilled Agriculture/Forestry/Fishermen \
count 1638.000000 1617.000000 # 该特征数据缺失
mean 0.589403 0.107656
std 0.206134 0.054270
min 0.053748 0.001049
25% 0.430316 0.068022
50% 0.590947 0.103011
75% 0.776601 0.138443
max 0.952283 0.340328
Experience Bicol region
count 1637.000000 # 该特征数据丢失 1638.000000
mean 2.100000 0.117216
std 0.870465 0.321776
min 1.000000 0.000000
25% 1.277778 0.000000
50% 2.034483 0.000000
75% 2.800000 0.000000
max 4.280000 1.000000
数据清洗
(1)缺失值检验
# 查看多少空值
data.isna().sum()
# 删除缺失值
data.dropna(inplace=True, subset=['% skilled Agriculture/Forestry/Fishermen', 'Experience'])
disaster_type 0
disaster_name 0
pcode 0
Completely damaged (abs.) 0
Partly damaged (abs.) 0
Total damaged houses (abs.) 0
total_damage_houses_0p25weight 0
Partly damaged (rel.) 0
Completely damaged (rel.) 0
Total damaged houses (rel.) 0
total_damage_houses_0p25weight_perc 0
ratio_comp_part 0
Total # of houses 0
Wind speed 0
Distance to typhoon 0
rainfall 0
distance_first_impact 0
Slope 0
Elevation 0
ruggedness_stdev 0
Ruggedness 0
slope_stdev 0
Predicted damage class (1-5) 1638 # 该属性不做处理因为后续不选择该属性作为特征
Population 0
land_area 0
Population density 0
Poverty incidence 0
% strong roof type 0
% strong wall type 0
% skilled Agriculture/Forestry/Fishermen 21
Region 0
prov 0
Experience 1
Bicol region 0
dtype: int64
(2)选择特征
根据数据探索分析,此处选择有用特征,剔除无用特征。
# 特征选择
data = data[['disaster_name', 'Total damaged houses (rel.)', 'Wind speed', 'Distance to typhoon', 'rainfall',
'distance_first_impact', 'Experience',
'ruggedness_stdev', 'Ruggedness', 'Elevation', 'Slope', 'slope_stdev', 'Population', 'Population density',
'land_area', 'Poverty incidence', '% skilled Agriculture/Forestry/Fishermen',
'% strong roof type', '% strong wall type']]
(3)异常值检验
没检查异常值检验前的数据概况:
# 有用特征
features = ['Total damaged houses (rel.)', 'Wind speed', 'Distance to typhoon', 'rainfall',
'distance_first_impact', 'Experience', 'ruggedness_stdev', 'Ruggedness', 'Elevation',
'Slope', 'slope_stdev', 'Population', 'Population density', 'land_area', 'Poverty incidence',
'% skilled Agriculture/Forestry/Fishermen', '% strong roof type', '% strong wall type']
# 直方图
# 显示在同一张画布
def outliers_hist_all(num, df, cols=5, save=False, save_path=None, botton=None, left=None):
plt.rcParams["font.sans-serif"] = ["SimHei"] # 设置字体
plt.rcParams["axes.unicode_minus"] = False # 该语句解决图像中的“-”负号的乱码问题
fig = plt.figure(figsize=(16, 16))
fig.text(0.5, 0.04, botton, ha='center', fontsize=28)
fig.text(0.04, 0.5, left, va='center', rotation='vertical', fontsize=18)
rows = int(np.ceil(num / cols))
for i in range(num):
ax = fig.add_subplot(rows, cols, i + 1)
ax.hist(df[df.columns[i]], bins=30)
ax.set_title(df.columns[i])
if save == True:
plt.savefig(save_path)
plt.show()
outliers_hist_all(len(features), data[features], botton='value', left='count')
# 箱线图检验
plt.figure(figsize=(8, 8))
data.boxplot(column=data[features].columns.tolist(), showmeans=True, meanline=True)
plt.xticks(rotation=90) # 旋转x轴标签,以便更好地显示特征名称
plt.show()
检查并删除异常值。
# 计算每个特征的平均值和标准差
mean = data.iloc[:, 1:].mean()
std = data.iloc[:, 1:].std()
# 定义异常值的阈值(平均值加减3倍标准差)
threshold = 3 * std
# 检查并根据3σ准则删除异常值
data = data[~((data < mean - threshold) | (data > mean + threshold)).any(axis=1)]
检查异常值并检验后的数据概况:
可以看出,根据三西格玛剔除异常值后直方图的整体分布更加聚集,箱线图中的异常点也减少一定比例。
将目标值缩放到0-1区间中
# 将房屋损坏率转换为0-1
data['Total damaged houses (rel.)'] = data['Total damaged houses (rel.)'] / 100
(4)相关性检验
data_corr = data[features].corr() # 相关系数矩阵
fig, ax = plt.subplots(figsize=(12, 12))
sns.heatmap(data_corr, annot=True, ax=ax)
plt.show()
data_corr['Total damaged houses (rel.)'].sort_values()
Distance to typhoon -0.435464
% strong wall type -0.304187
distance_first_impact -0.241009
Experience -0.236378
% strong roof type -0.206935
Elevation -0.162279
Population density -0.085511
Population -0.065211
Slope -0.057425
Ruggedness -0.052857
land_area -0.023486
slope_stdev 0.001225
ruggedness_stdev 0.001823
% skilled Agriculture/Forestry/Fishermen 0.031725
rainfall 0.229307
Poverty incidence 0.270775
Wind speed 0.722516
Total damaged houses (rel.) 1.000000
Name: Total damaged houses (rel.), dtype: float64
(5)多特征关系
将筛选出来的特征与预测变量进行展示特征之间关系,可见变量之间存在一定程度上的相关性。
数据准备
变量变换
使用Box-Cox变换来对潜在的损坏预测变量进行测试,并确定是否需要对变量进行幂律变换,以相对于链接变换的损坏比例标准化变量。
# 对预测变量使用逻辑回归转换 expit(x) = 1 / (1 + exp(-x))
link_fn_td = expit(data['Total damaged houses (rel.)'])
# 将每个特征的最大值 转换成离它最近的半整数
def power_est(bC):
l_est = bC[bC == max(bC)].iloc[0]
return round(l_est / 0.5) * 0.5
# 选择需要的特征
var_names = data.columns.tolist()
var_names = [var for var in var_names if
var not in ["disaster_type", "disaster_name", "pcode", "Total damaged houses (rel.)", "Region", "prov"]]
# 创建一个全为空的dataframe
power_df = pd.DataFrame(np.nan, index=[0], columns=var_names)
# 使用 Box-Cox 变换来测试每个潜在的损坏预测变量,以确定是否需要对变量进行幂律变换,以相对于链接变换的损坏比例标准化变量。
for i, var_i in enumerate(var_names):
power_df.iloc[0, i] = power_est(boxcox(link_fn_td - np.min(link_fn_td) * 1.01 + 0.01, data[var_i]))
td_data = data.copy()
Wind speed Distance to typhoon rainfall distance_first_impact \
0 0.0 0.0 0.0 0.0
Experience ruggedness_stdev Ruggedness Elevation Slope slope_stdev \
0 0.0 0.0 0.0 0.0 0.0 0.0
Population Population density land_area Poverty incidence \
0 0.0 0.0 0.0 -1.0
% skilled Agriculture/Forestry/Fishermen % strong roof type \
0 -1.5 -1.0
% strong wall type
0 -1.0
应用获得的变换:
# 根据power_df中的值对原始数据进行了不同的转换操作,以使得数据符合特定的分布
for i, var_i in enumerate(var_names):
if power_df.loc[0, var_i] == 0:
# 通过对数转换(np.log)
# 可以将数据转换为更接近正态分布的形式,因为对数转换可以压缩较大的值,拉伸较小的值,使数据更加对称和稳定。
td_data[var_i] = np.log(td_data[var_i])
else:
# 通过幂律变换(td_data[var_i] ** power_df.loc[0, var_i])
# 可以将数据转换为更接近幂律分布的形式,因为幂律变换可以改变数据的分布形状,使其符合幂律分布的特征。
td_data[var_i] = td_data[var_i] ** power_df.loc[0, var_i]
数据缩放
将除了目标列的特征列标准化
# 标准化
cols_mean_td = td_data[var_names].mean()
cols_sd_td = td_data[var_names].std()
scaled_data_td = (td_data[var_names] - cols_mean_td) / cols_sd_td
td_data[var_names] = scaled_data_td
数据准备后的数据对其以直方图展示:
可见,增提分布特征已经越来越明显。因此,可认为所做的数据变换是有效变换。
探索性数据分析
探索数据以确定预测变量与建筑物损坏比例之间可能的关系。
房屋受损比例:
# 选择有数值的列
numeric_data = td_data.select_dtypes(include='number')
# 数据重塑 将数据从宽格式(wide format)转换为长格式(long format)
melted_data = pd.melt(numeric_data, id_vars=['Total damaged houses (rel.)'],
var_name='var', value_name='value')
# 使用seaborn绘制散点图
plt.figure(figsize=(12, 8))
sns.scatterplot(data=melted_data, x='value', y='Total damaged houses (rel.)', hue='var')
plt.title('Scatter plot for each variable')
plt.xlabel('Value')
plt.ylabel('Total damaged houses (rel.)')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
features_sc = ['Wind speed', 'Distance to typhoon', 'rainfall',
'distance_first_impact', 'Experience', 'ruggedness_stdev', 'Ruggedness', 'Elevation',
'Slope', 'slope_stdev', 'Population', 'Population density', 'land_area', 'Poverty incidence',
'% skilled Agriculture/Forestry/Fishermen', '% strong roof type', '% strong wall type']
# 散点图
# 创建一个包含所有特征对应的子图标题的列表
titles = [feature for feature in features_sc]
# 计算子图的行数和列数
num_features = len(features_sc)
rows = int(num_features / 4) + 1
cols = min(num_features, 4)
# 创建一个新的图形,并设置图形的大小
fig = plt.figure(figsize=(15, 15))
# 遍历每个特征,创建对应的子图
for i, feature in enumerate(features_sc):
# 在图形中添加子图,并指定子图的位置
ax = fig.add_subplot(rows, cols, i + 1)
# 绘制散点图
ax.scatter(td_data[feature], expit(td_data['Total damaged houses (rel.)']))
# 设置子图标题
ax.set_title(titles[i])
# 设置x轴和y轴标签
ax.set_xlabel(feature)
ax.set_ylabel('Total damaged houses (rel.)')
# 调整子图之间的间距
fig.tight_layout()
# 显示图形
plt.show()
以散点图,可以看出,在一定程度上,经过我们的数据变换使得数据更加集中,这样使得原始数据将会更加有意义。
对于这个变换而言,风速根据其大小可以看到独特的行为。虽然对于较低的风速,可以观察到不少分散且显着的正斜率,但对于较高的风速(例如高于平均水平),这种损害的增量似乎饱和。
其他变量(例如海拔和降雨量)也存在类似的趋势,但由于风速是主要预测变量,因此分析仅限于此。
数据集
损坏房屋的比例:
# 处理过后的数据集展示
td_data.sample(5)
disaster_name Total damaged houses (rel.) Wind speed \
279 Hagupit 0.043671 -1.619854
1610 Utor 0.166409 0.871446
107 Bopha 0.027567 0.202984
625 Haiyan 0.777924 2.111691
1455 Rammasun 0.117990 -0.640615
Distance to typhoon rainfall distance_first_impact Experience \
279 1.369567 -3.911680 0.510022 -1.149703
1610 -0.655048 1.501795 -1.785016 1.055731
107 0.167086 1.160104 -1.089411 -1.545198
625 -2.225993 0.266232 0.776345 -1.092881
1455 0.470196 -0.169615 -0.988571 0.460752
ruggedness_stdev Ruggedness Elevation Slope slope_stdev \
279 0.412075 0.237446 -0.849878 0.107418 0.264466
1610 0.554183 0.460424 0.871855 0.487643 0.587852
107 0.947218 1.091489 1.514961 1.032755 0.884493
625 -0.100165 -0.369386 -0.775195 -0.345895 -0.073127
1455 -0.072546 0.203466 -0.599278 0.213241 0.015068
Population Population density land_area Poverty incidence \
279 -2.246590 -0.959423 -0.947639 -0.780200
1610 -0.141595 -0.978254 1.002742 -0.028983
107 1.307730 -0.582448 1.872402 -0.688154
625 0.315079 0.381172 -0.152591 -0.440168
1455 -1.148631 0.322721 -1.425962 -0.815481
% skilled Agriculture/Forestry/Fishermen % strong roof type \
279 -0.315185 1.120590
1610 -0.290428 -0.550206
107 -0.078037 0.167487
625 -0.196568 -0.581216
1455 -0.317290 2.392163
% strong wall type
279 1.268849
1610 -0.326936
107 0.723035
625 0.085402
1455 0.139717
检查多重共线性
使用广义线性模型(Generalized Linear Model)来拟合一个多分类的逻辑回归模型,以预测数据集中的'Total damaged houses (rel.)'列的值。
同时检查多重共线性,通过计算方差膨胀因子(VIF)来判断是否存在共线性问题,并在存在共线性问题时(VIF>10 )进行变量的逐步剔除。
# 使用Generalized Linear Model(广义线性模型)来拟合一个多分类的逻辑回归模型,
# 以预测'td_data'中的'Total damaged houses (rel.)'列的值。
td_full = sm.GLM(td_data['Total damaged houses (rel.)'], td_data[var_names], family=Binomial(link=logit())).fit()
td_data.shape
# 检查多重共线性
temp_model = td_full
temp_vif = [variance_inflation_factor(td_data[var_names].values, i) for i in range(len(var_names))]
temp_var = ["dummy"]
temp_data = td_data.copy()
while max(temp_vif) > 10:
var_remove = temp_data.columns[2:][temp_vif.index(max(temp_vif))]
temp_var.append(var_remove)
temp_data = temp_data.drop(columns=[var_remove])
temp_model = sm.GLM(temp_data['Total damaged houses (rel.)'], temp_data[temp_data.columns[2:]],
family=Binomial(link=logit())).fit()
temp_vif = [variance_inflation_factor(temp_data[temp_data.columns[2:]].values, i) for i in
range(len(temp_data.columns[2:]))]
temp_vif_with_names = list(zip(temp_data.columns[2:], temp_vif))
pd_temp_vif_with_names = pd.DataFrame(temp_vif_with_names)
pd_temp_vif_with_names.columns = ['Features', 'VIF']
pd_temp_vif_with_names
Features VIF
0 Wind speed 4.024396
1 Distance to typhoon 3.210826
2 rainfall 1.316407
3 distance_first_impact 1.466195
4 Experience 1.509735
5 Elevation 3.483213
6 slope_stdev 3.030725
7 Population 1.889770
8 land_area 2.336313
9 Poverty incidence 1.739308
10 % skilled Agriculture/Forestry/Fishermen 1.291850
11 % strong roof type 1.403770
12 % strong wall type 1.482686
temp_var
['dummy', 'Population density', 'Slope', 'ruggedness_stdev', 'Ruggedness']
# 更新特征
td_col_to_use = temp_data.columns
td_data = td_data[td_data.columns.intersection(td_col_to_use)]
BorutaShap特征选择
X = td_data[['Wind speed', 'Distance to typhoon', 'rainfall', 'distance_first_impact',
'Experience', 'Elevation', 'slope_stdev', 'Population', 'land_area',
'Poverty incidence', '% skilled Agriculture/Forestry/Fishermen',
'% strong roof type', '% strong wall type']]
y = td_data['Total damaged houses (rel.)']
Feature_Selector = BorutaShap(importance_measure='shap', classification=False)
Feature_Selector.fit(X=X, y=y, n_trials=100, random_state=0)
# 采用shap算法,经过100轮计算后得到的各个特征信息如下:
# 9 attributes confirmed important: ['Experience', 'distance_first_impact', 'Wind speed', 'Poverty incidence', 'Elevation', 'Population', '% strong roof type', 'Distance to typhoon', '% strong wall type']
# 3 attributes confirmed unimportant: ['rainfall', 'land_area', 'slope_stdev']
# 1 tentative attributes remains: ['% skilled Agriculture/Forestry/Fishermen']
数据预处理 最终数据集描述信息:
td_data.describe()
Total damaged houses (rel.) Wind speed Distance to typhoon \
count 1405.000000 1.405000e+03 1.405000e+03
mean 0.180416 1.011449e-16 -2.124042e-16
std 0.258533 1.000000e+00 1.000000e+00
min 0.000047 -1.619854e+00 -4.906482e+00
25% 0.003059 -6.406151e-01 -4.960408e-01
50% 0.034921 5.416582e-02 1.040757e-01
75% 0.285269 6.854903e-01 7.271722e-01
max 0.951642 2.143221e+00 1.557193e+00
rainfall distance_first_impact Experience Elevation \
count 1.405000e+03 1.405000e+03 1.405000e+03 1.405000e+03
mean 7.333003e-16 -5.714685e-16 8.091590e-17 2.022897e-17
std 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
min -4.277711e+00 -5.104474e+00 -1.545198e+00 -3.294853e+00
25% -4.636419e-01 -5.679346e-01 -9.648785e-01 -6.177726e-01
50% 9.712316e-02 1.653390e-01 1.362759e-01 6.635912e-02
75% 6.366046e-01 7.170544e-01 8.923924e-01 7.161769e-01
max 2.501303e+00 1.766611e+00 1.896988e+00 1.972304e+00
slope_stdev Population land_area Poverty incidence \
count 1.405000e+03 1.405000e+03 1.405000e+03 1.405000e+03
mean -8.597314e-17 -5.664113e-16 -1.062021e-16 -4.045795e-17
std 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
min -3.116919e+00 -4.054597e+00 -3.458290e+00 -1.001795e+00
25% -1.697487e-01 -6.073934e-01 -6.682982e-01 -6.623434e-01
50% 3.434245e-01 4.849719e-02 2.027343e-03 -2.661507e-01
75% 6.702839e-01 6.905744e-01 6.590482e-01 3.871621e-01
max 1.184968e+00 2.551172e+00 2.195896e+00 9.986928e+00
% skilled Agriculture/Forestry/Fishermen % strong roof type \
count 1405.000000 1.405000e+03
mean 0.000000 3.843505e-16
std 1.000000 1.000000e+00
min -0.365098 -7.386823e-01
25% -0.263353 -6.261493e-01
50% -0.187080 -4.621668e-01
75% 0.004608 2.504821e-01
max 29.045222 5.186857e+00
% strong wall type
count 1.405000e+03
mean 8.091590e-17
std 1.000000e+00
min -7.760501e-01
25% -5.753445e-01
50% -2.728524e-01
75% 1.895579e-01
max 1.396548e+01
筛选过特征后的数据直方图:
- 0
- 0
-
分享