ACC
<p>[TOC]</p>
<pre><code class="language-python">%matplotlib inline
%load_ext autoreload
%autoreload 2
import meteva.base as meb
import meteva.method as mem
import meteva.perspact as mps
import numpy as np
import pandas as pd
import datetime
import meteva</code></pre>
<h2>ACC(距平相关系数)概述</h2>
<p><font face="黑体" color=Black size=4>
<br>将预报场和观测场都扣除气候平均之后,再计算相关系数。<br></font></p>
<p>在应用该算法时须了解以下几点: </p>
<ol>
<li>气候平均场是基于1989-2008年的ERA-40,采用61天的窗口滑动平均得到。例如7曰31日08时的平均场,使用的是20年内每一年的7曰1日至8曰30日每日的08时的z500取平均得到,每个时刻的平均值使用到了20×61=1220个再分析数据。 </li>
<li>气候平均场以配置文件形式存放在meteva安装目录的resources 文件夹下,但考虑到文件较大,默认不随MetEva一起自动安装,但回在第一次运行ACC算法时自动下载到指定目录,如果自动下载失败,则需要手动下载文件 <a href="https://github.com/nmcdev/meteva/blob/master/meteva/resources/z500_cli.nc">https://github.com/nmcdev/meteva/blob/master/meteva/resources/z500_cli.nc</a><br />
并存放在MetEva安装目录的resources 文件夹。 </li>
<li>气候平均场中只包含每日00UTC和12UTC((北京时08和20时)的结果,检验的对象也需是这些时刻 </li>
<li>气候平均场的空间分辨率是1.5°×1.5°,范围是全球,如果检验对象的网格和此不一致,则采用双线性插值方法将气候平均场插值到检验对象所在的网格上。 </li>
<li>气候平均场的数据有效位数为小数点后1位。 </li>
<li>气候平均场的单位时位势,而不是位势米,当检验对象时位势米时,需将检验对象乘以9.8</li>
</ol>
<pre><code class="language-python">print(meteva.base.z500_cli) # z500_cli文件的存储路径</code></pre>
<pre><code>h:\task\develop\python\git\meteva_\meteva\resources\z500_cli.nc</code></pre>
<pre><code class="language-python">grd_climate = meb.read_griddata_from_nc(meteva.base.z500_cli)
print(grd_climate)</code></pre>
<pre><code>&lt;xarray.DataArray &#039;data0&#039; (member: 1, level: 1, time: 732, dtime: 1, lat: 121,
lon: 240)&gt;
array([[[[[[50282.6, 50282.6, 50282.6, ..., 50282.6, 50282.6,
50282.6],
[50293.4, 50295.1, 50296.9, ..., 50287.9, 50289.6,
50291.4],
[50286.1, 50290.1, 50294.1, ..., 50274.6, 50278.4,
50282.1],
...,
[49725.4, 49725.9, 49726.6, ..., 49723.1, 49723.9,
49724.6],
[49736.9, 49737.1, 49737.4, ..., 49736.1, 49736.4,
49736.6],
[49767.4, 49767.4, 49767.4, ..., 49767.4, 49767.4,
49767.4]]],
[[[50283.1, 50283.1, 50283.1, ..., 50283.1, 50283.1,
50283.1],
[50294.1, 50295.8, 50297.8, ..., 50288.3, 50290.3,
50292.1],
[50287.1, 50291.1, 50295.3, ..., 50274.8, 50278.8,
...
49727.6],
[49739.1, 49739.4, 49739.6, ..., 49738.4, 49738.6,
49738.9],
[49769.4, 49769.4, 49769.4, ..., 49769.4, 49769.4,
49769.4]]],
[[[50278. , 50278. , 50278. , ..., 50278. , 50278. ,
50278. ],
[50288.7, 50290.7, 50292.5, ..., 50283. , 50285. ,
50286.7],
[50281.5, 50285.7, 50290. , ..., 50269.2, 50273.2,
50277.5],
...,
[49731.5, 49732. , 49732.7, ..., 49729.5, 49730. ,
49730.7],
[49749.5, 49749.7, 49750.2, ..., 49748.7, 49749. ,
49749.2],
[49777.5, 49777.5, 49777.5, ..., 49777.5, 49777.5,
49777.5]]]]]])
Coordinates:
* member (member) &lt;U5 &#039;data0&#039;
* level (level) int32 500
* time (time) datetime64[ns] 2020-01-01 ... 2020-12-31T12:00:00
* dtime (dtime) int32 0
* lat (lat) float64 -90.0 -88.5 -87.0 -85.5 -84.0 ... 85.5 87.0 88.5 90.0
* lon (lon) float64 0.0 1.5 3.0 4.5 6.0 ... 352.5 354.0 355.5 357.0 358.5
Attributes:
dtime_units: hour</code></pre>
<p>在z500_cli的气候态数据中732个时刻(每日的00和12时)平面场,气候数据是没有具体年份的,但为了读取方便,将年份记录在2020年。如果需要准备其它变量的气候态数据,需要和z500_cli保持相同的格式。</p>
<h1>单时刻ACC计算</h1>
<p><font face="黑体" color=Blue size=3><strong>acc(grd_ob, grd_fo,climate_path = None)</strong></font> </p>
<p>计算单个网格预报场和实况场之间的距平相关系数</p>
<table>
<thead>
<tr>
<th style="text-align: left;">参数</th>
<th style="text-align: center;">说明</th>
<th style="text-align: left;">备注</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><strong><font face="黑体" color=Blue size=5>grd_ob</font></strong></td>
<td style="text-align: center;">网格观测数据</td>
<td style="text-align: left;"><a href="https://www.showdoc.com.cn/meteva?page_id=3975600815874861">网格数据格式</a></td>
</tr>
<tr>
<td style="text-align: left;"><strong><font face="黑体" color=Blue size=5>grd_fo</font></strong></td>
<td style="text-align: center;">网格预报数据</td>
<td style="text-align: left;"><a href="https://www.showdoc.com.cn/meteva?page_id=3975600815874861">网格数据格式</a></td>
</tr>
<tr>
<td style="text-align: left;">climate_path</td>
<td style="text-align: center;">气候态数据的文件路径</td>
<td style="text-align: left;">缺省情况下该路径会指向 meteva/resources/z500_cli.nc 文件所在的路径,如果检验评估的对象不是z500,则需要采用上述方法准备一年气候态数据,并通过climate_path 传入该气候态数据的文件路径</td>
</tr>
<tr>
<td style="text-align: left;"><font face="黑体" color=blue size=5>return</font></td>
<td style="text-align: center;">实数</td>
<td style="text-align: left;"></td>
</tr>
</tbody>
</table>
<p><strong>调用示例</strong></p>
<pre><code class="language-python">path_ec = r&quot;H:\test_data\input\mem\acc\ECMWF\z500\22050108.012.nc&quot;
grd_ec = meb.read_griddata_from_nc(path_ec) #读取预报场数据
print(grd_ec) </code></pre>
<pre><code>&lt;xarray.DataArray &#039;data0&#039; (member: 1, level: 1, time: 1, dtime: 1, lat: 241,
lon: 281)&gt;
array([[[[[[5857.1, 5857.6, 5857.9, ..., 5870.1, 5870.1, 5869.6],
[5857.4, 5857.6, 5857.4, ..., 5869.6, 5869.9, 5869.6],
[5857.4, 5857.4, 5857.4, ..., 5869.4, 5869.9, 5870.1],
...,
[5397.9, 5400.1, 5402.4, ..., 5461.4, 5463.4, 5465.6],
[5390.1, 5392.1, 5394.1, ..., 5461.9, 5464.1, 5466.4],
[5382.4, 5384.4, 5386.4, ..., 5462.1, 5464.4, 5466.6]]]]]])
Coordinates:
* member (member) &lt;U5 &#039;data0&#039;
* level (level) float64 500.0
* time (time) datetime64[ns] 2022-05-01T08:00:00
* dtime (dtime) int32 12
* lat (lat) float64 0.0 0.25 0.5 0.75 1.0 ... 59.0 59.25 59.5 59.75 60.0
* lon (lon) float64 70.0 70.25 70.5 70.75 ... 139.2 139.5 139.8 140.0
Attributes:
dtime_units: hour</code></pre>
<pre><code class="language-python">path_ec0 = r&quot;H:\test_data\input\mem\acc\ECMWF\z500\22050120.000.nc&quot;
grd_ec0 = meb.read_griddata_from_nc(path_ec0) #读取实况场数据,以模式的零场来代表实况
print(grd_ec0)</code></pre>
<pre><code>&lt;xarray.DataArray &#039;data0&#039; (member: 1, level: 1, time: 1, dtime: 1, lat: 241,
lon: 281)&gt;
array([[[[[[5856.4, 5856.4, 5856.7, ..., 5864.4, 5864.7, 5865.4],
[5856.7, 5856.7, 5856.4, ..., 5865.2, 5865.4, 5865.9],
[5857.2, 5856.7, 5856.7, ..., 5866.2, 5866.2, 5866.2],
...,
[5399.7, 5401.9, 5404.4, ..., 5460.2, 5462.4, 5464.4],
[5392.2, 5394.2, 5396.7, ..., 5460.4, 5462.7, 5464.9],
[5384.4, 5386.4, 5388.9, ..., 5460.7, 5462.9, 5465.2]]]]]])
Coordinates:
* member (member) &lt;U5 &#039;data0&#039;
* level (level) float64 500.0
* time (time) datetime64[ns] 2022-05-01T20:00:00
* dtime (dtime) int32 0
* lat (lat) float64 0.0 0.25 0.5 0.75 1.0 ... 59.0 59.25 59.5 59.75 60.0
* lon (lon) float64 70.0 70.25 70.5 70.75 ... 139.2 139.5 139.8 140.0
Attributes:
dtime_units: hour</code></pre>
<pre><code class="language-python">grd_ec.values *= 9.8 #根据数据结果判断预报和实况的单位都是位势米,因此乘以9.8
grd_ec0.values *= 9.8
acc1 = mem.acc(grd_ec0,grd_ec) #调用距平相关系数计算函数进行计算
print(acc1)</code></pre>
<pre><code>0.9985477061247351</code></pre>
<p>函数 mem.acc.acc 只能计算单个时刻预报和实况之间的距平相关系数,如果需要计算任意时段内的ACC则需要首先计算各时刻的相关系数中间量,再利用透视分析功能,计算任意时段的ACC。</p>
<h1>任意时段ACC计算</h1>
<h2>中间量统计</h2>
<p><font face="黑体" color=Blue size=3><strong>acc_middle(para_acc)</strong></font> </p>
<p>为统计任意时段的acc准备中间量数据。该程序可以自动跳过已经统计过的起报时间和时效。</p>
<table>
<thead>
<tr>
<th style="text-align: left;">参数</th>
<th style="text-align: center;">说明</th>
<th style="text-align: left;">备注</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><strong><font face="黑体" color=Blue size=5>para_acc</font></strong></td>
<td style="text-align: center;">待统计的观测和预报数据和检验参数描述</td>
<td style="text-align: left;">字典形式,内容见下面的示例</td>
</tr>
<tr>
<td style="text-align: left;"><font face="黑体" color=blue size=5>return</font></td>
<td style="text-align: center;">无</td>
<td style="text-align: left;">以hdf文件格式将记录了每个起报时刻、每个时效、每个模式、每个区域内观测和预报的样本数、观测平均值、预报平均值、观测方差、预报方差、协方差的pandas.DataFrame格式数据输出到文件中</td>
</tr>
</tbody>
</table>
<p><strong>调用示例</strong></p>
<pre><code class="language-python">para_acc = {
&quot;begin_time&quot;: datetime.datetime(2022,5,1,8), #起始时间(预报)
&quot;end_time&quot;: datetime.datetime(2022,6,1,8), #结束时间(预报)
&quot;time_step&quot;:12, #起报时间间隔
&quot;middle_result_path&quot;: r&quot;H:\test_data\output\method\space\acc\tmmsss_z500.parquet&quot;,
&quot;climate_path&quot;:None, #气候态数据的文件路径,缺省情况下指向z500_cli的路径
&quot;grid&quot; : meb.grid([70,140,0.25],[0,60,0.25]), # 检验区域
&quot;step&quot;:2, #masker间隔(单位:°),该参数只能设置为None或数据网格距的倍数,无需绘制ACC空间分布时设为None
&quot;ob_data&quot;: {
&quot;dir_ob&quot;:r&quot;H:\test_data\input\mem\acc\ECMWF\z500\YYMMDDHH.TTT.nc&quot;, # 实况场数据路径
&quot;read_method&quot;: meteva.base.io.read_griddata_from_nc, #实况数据读取函数
&quot;read_para&quot;: {}, #实况数据读取参数
&quot;time_type&quot;: &quot;BT&quot;, #实况数据时间类型
&quot;multiple&quot;: 9.8, #当文件数据的单位是位势米,通过乘以9.8转换成位势
},
&quot;fo_data&quot;: {
&quot;ECMWF&quot;: {
&quot;dir_fo&quot;:r&quot;H:\test_data\input\mem\acc\ECMWF\z500\YYMMDDHH.TTT.nc&quot;, #预报数据路径
&quot;dtime&quot;: [12, 72, 12], #检验时效
&quot;read_method&quot;: meteva.base.io.read_griddata_from_nc, #预报数据读取函数
&quot;read_para&quot;: {}, #预报数据读取参数
&quot;time_type&quot;: &quot;BT&quot;,#预报数据时间类型是北京时,即08时起报
&quot;multiple&quot;:9.8, #当文件数据的单位是位势米,通过乘以9.8转换成位势
},
&quot;NCEP&quot;: {
&quot;dir_fo&quot;: r&quot;H:\test_data\input\mem\acc\NCEP\z500\YYMMDDHH.TTT.nc&quot;, #预报数据路径
&quot;dtime&quot;: [12, 72, 12], #检验时效
&quot;read_method&quot;: meteva.base.io.read_griddata_from_nc, #预报数据读取函数
&quot;read_para&quot;: {},#预报数据读取参数
&quot;time_type&quot;: &quot;UT&quot;, #预报数据时间类型是世界时,即00时起报
&quot;multiple&quot;: 1, #当文件数据的单位是位势,则×1表示,不变
},
},
}</code></pre>
<pre><code class="language-python">mem.acc_middle_z500(para_acc) #执行中间量统计程序</code></pre>
<pre><code>success read from H:\test_data\input\mem\acc\ECMWF\z500\22050120.000.nc
success read from H:\test_data\input\mem\acc\ECMWF\z500\22050108.012.nc
success read from H:\test_data\input\mem\acc\ECMWF\z500\22050208.000.nc
success read from H:\test_data\input\mem\acc\ECMWF\z500\22050108.024.nc
success read from H:\test_data\input\mem\acc\ECMWF\z500\22050220.000.nc
success read from H:\test_data\input\mem\acc\ECMWF\z500\22050108.036.nc
success read from H:\test_data\input\mem\acc\ECMWF\z500\22050308.000.nc
... ...
中间量统计程序运行完毕</code></pre>
<pre><code class="language-python">df_mid = pd.read_parquet(para_acc[&quot;middle_result_path&quot;]) #读取程序输出的中间量
print(df_mid) #查看中间量</code></pre>
<pre><code> level time dtime member id T MX \
0 500.0 2022-05-01 08:00:00 12 ECMWF 4096 63.901637 -100.572688
0 500.0 2022-05-01 08:00:00 12 ECMWF 4098 63.901637 -105.099722
0 500.0 2022-05-01 08:00:00 12 ECMWF 4100 63.901637 -108.437408
0 500.0 2022-05-01 08:00:00 12 ECMWF 4102 63.901637 -89.379083
0 500.0 2022-05-01 08:00:00 12 ECMWF 4104 63.901637 -78.855785
.. ... ... ... ... ... ... ...
0 500.0 2022-05-09 20:00:00 72 NCEP 6134 63.740954 134.001297
0 500.0 2022-05-09 20:00:00 72 NCEP 6136 63.740954 142.230275
0 500.0 2022-05-09 20:00:00 72 NCEP 6138 63.740954 121.335484
0 500.0 2022-05-09 20:00:00 72 NCEP 6140 63.740954 114.683497
0 500.0 2022-05-09 20:00:00 72 NCEP 4094 63.901637 86.583352
MY SX SY SXY
0 -81.292262 53.979844 55.923339 18.101133
0 -73.188211 33.639016 34.036308 9.101492
0 -70.538050 102.656192 30.606767 -5.461846
0 -91.783602 146.955801 180.611761 141.097535
0 -96.083767 129.294580 65.650976 63.215796
.. ... ... ... ...
0 102.832330 210.316643 20.517030 47.249523
0 107.483435 114.212685 4.790481 3.222098
0 104.117095 20.013991 13.666516 -5.126891
0 91.253649 64.993507 32.597502 14.605424
0 104.995706 121.546486 31.746516 18.075971
[237708 rows x 11 columns]</code></pre>
<h2>基于中间量计算ACC</h2>
<pre><code class="language-python">#绘制ACC随时效的变化图
score,gdict = mps.score_df(df_mid,mem.corr,g = [&quot;member&quot;,&quot;dtime&quot;],plot = &quot;line&quot;,
xlabel = &quot;时效(单位:小时)&quot;,ylabel = &quot;ACC&quot;,title = &quot;acc随时效的变化&quot;,grid = True,
height = 4,width = 7,sup_fontsize = 20)</code></pre>
<p><img src="https://www.showdoc.com.cn/server/api/attachment/visitFile?sign=fa8e5ae8f260fba84b9aa3c9c79fd5e4&amp;file=file.png" alt="" /></p>
<pre><code class="language-python">#选取部分样本,绘制ACC随时间的变化
score,gdict = mps.score_df(df_mid,mem.corr,s = {&quot;dtime&quot;:72},g = [&quot;member&quot;,&quot;time&quot;],plot = &quot;line&quot;,
xlabel = &quot;起报时间&quot;,ylabel = &quot;ACC&quot;,title = &quot;072H时效acc随时间的变化&quot;,grid = True,
height = 4,width = 7,sup_fontsize = 16)</code></pre>
<p><img src="https://www.showdoc.com.cn/server/api/attachment/visitFile?sign=50afa1d23f2cedbb7188d5804b67f165&amp;file=file.png" alt="" /></p>
<pre><code class="language-python">#绘制ACC的空间分布图,前提是para_acc中step的取值不为None
grd_score_list,gdict=mps.score_xy_df(df_mid,mem.corr,s = {&quot;dtime&quot;:72},g = [&quot;member&quot;],ncol = 2)</code></pre>
<p><img src="https://www.showdoc.com.cn/server/api/attachment/visitFile?sign=65c8aff677e02f62346e13130402537b&amp;file=file.png" alt="" /></p>
<h1>基于中间量计算activity</h1>
<p>activity 的本质就是去掉距平后的观测(预报)要素场的时空分布标准差,因此可以用mem.ob_fo_std指标来进行计算</p>
<pre><code class="language-python">#绘制activity随时效的变化图
score,gdict = mps.score_df(df_mid,mem.ob_fo_std,g = [&quot;member&quot;,&quot;dtime&quot;],plot = &quot;line&quot;,
xlabel = &quot;时效(单位:小时)&quot;,ylabel = &quot;activity&quot;,title = &quot;activity随时效的变化&quot;,grid = True,
height = 4,width = 7,sup_fontsize = 20)</code></pre>
<p><img src="https://www.showdoc.com.cn/server/api/attachment/visitFile?sign=31320bb564968ea926df1d87b4a40f3f&amp;file=file.png" alt="" /></p>
<p>因为上面的示例中用到的数据只有很少的几天,因此实况的activity的曲线也并不平稳,当数据序列增加后这个问题不再存在。</p>
<pre><code class="language-python"></code></pre>