4、百度天宫图片分类大赛
<pre><code class="language-python">import pandas as pd
import numpy as np</code></pre>
<pre><code class="language-python">df = pd.DataFrame(pd.read_csv('./data/micro_height_data_lable.csv',header=0))</code></pre>
<pre><code class="language-python">df.head()</code></pre>
<pre><code class="language-python">import os
import shutil</code></pre>
<p>创建6个分类的文件夹,便于matlab中导入数据方便</p>
<pre><code class="language-python">os.mkdir('OCEAN')
os.mkdir('MOUNTAIN')
os.mkdir('DESERT')
os.mkdir('LAKE')
os.mkdir('FARMLAND')
os.mkdir('CITY')</code></pre>
<pre><code class="language-python">shutil.move("./data/multi_test_data_lable.csv","./data/CITY") # 移动文件或目录</code></pre>
<pre><code>'out: ./data/CITY\\multi_test_data_lable.csv'</code></pre>
<pre><code class="language-python"> # 照片目录
file_dir="D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\multi_test_data"
# 读入标签数据
df = pd.DataFrame(pd.read_csv('./data/multi_test_data_lable.csv',header=0))
# 获取lables 和img_name
lables = df['lables']
img_name = df['img_name']
# 遍历文件夹内所有图片,读取图片存到img,遍历到名字,在img_name中查找
# 查找到后,获取当前行数,移动文件到对应分类文件夹
for file in os.listdir(file_dir):
#print(file)
#img_path=file_dir+'\\'+file #每个图片的地址
# img=Image.open(img_path)
i = 0
for img in img_name:
if (file == img):
shutil.move(file_dir+'\\'+img,"./data/"+lables[i])
break
else:
i = i+1
</code></pre>
<hr />
<hr />
<pre><code class="language-python">import numpy as np
from PIL import Image</code></pre>
<pre><code class="language-python">def read_image(img_name):
im = Image.open(img_name) #.convert('L')
data = np.array(im)
return data</code></pre>
<pre><code class="language-python">import os
images=[]
# 照片目录
file_dir="D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data"
for file in os.listdir(file_dir):
img_path=file_dir+'\\'+file #每个图片的地址
images.append(read_image(img_path))</code></pre>
<p>查找目录内不是(256,256,3)类型的图片</p>
<pre><code class="language-python">j = 0
for i in images:
#print(len(i))
if(i.shape==(256, 256, 3)):
j = j+1
else:
print(j)
j = j+1</code></pre>
<pre><code class="language-python">X_img = np.array(images)</code></pre>
<pre><code>
ValueError Traceback (most recent call last)
<ipython-input-15-0f8e6620abfa> in <module>()
----> 1 X_img = np.array(images)
ValueError: could not broadcast input array from shape (256,256,3) into shape (256)</code></pre>
<p>将matlab中 mat类型变量导入到python中
matlab变量为cell类型的array</p>
<pre><code class="language-python">import numpy as np
import pandas as pd
import scipy.io as sio</code></pre>
<pre><code class="language-python">pfile = sio.loadmat('./pfile.mat')</code></pre>
<pre><code class="language-python">pfile</code></pre>
<pre><code> {'__globals__': [],
'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Nov 5 16:59:22 2018',
'__version__': '1.0',
'pfiles': array([[array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_KD96UmGQ6KWVeohF.jpg'],
dtype='<U76')],
[array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_KDDTIKoSiQgwZiUQ.jpg'],
dtype='<U76')],
...............
[array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_s8cXI99nJNDiXXNc.jpg'],
dtype='<U76')],
[array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_s8gMVZYugHHFmUyg.jpg'],
dtype='<U76')]], dtype=object)}
</code></pre>
<pre><code class="language-python">pf = pfile['pfiles']</code></pre>
<pre><code class="language-python">pf[0,0]</code></pre>
<pre><code>array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_KD96UmGQ6KWVeohF.jpg'],
dtype='<U76')</code></pre>
<pre><code class="language-python">pPre = sio.loadmat('./cell_str.mat')</code></pre>
<pre><code class="language-python">pPre</code></pre>
<pre><code> {'__globals__': [],
'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Nov 5 17:12:21 2018',
'__version__': '1.0',
'cell_pre': array([[array(['DESERT'], dtype='<U6')],
[array(['OCEAN'], dtype='<U5')],
[array(['MOUNTAIN'], dtype='<U8')],
[array(['MOUNTAIN'], dtype='<U8')],
[array(['LAKE'], dtype='<U4')],
[array(['DESERT'], dtype='<U6')],
[array(['LAKE'], dtype='<U4')],
[array(['LAKE'], dtype='<U4')],
[array(['DESERT'], dtype='<U6')],
[array(['DESERT'], dtype='<U6')],
...........
[array(['OCEAN'], dtype='<U5')],
[array(['LAKE'], dtype='<U4')],
[array(['DESERT'], dtype='<U6')]], dtype=object)}</code></pre>
<pre><code class="language-python">cell_pre = pPre['cell_pre']</code></pre>
<pre><code class="language-python">cell_pre[0]</code></pre>
<pre><code>array([array(['DESERT'], dtype='<U6')], dtype=object)</code></pre>
<pre><code class="language-python">pf[0,0]</code></pre>
<pre><code>array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_KD96UmGQ6KWVeohF.jpg'],
dtype='<U76')</code></pre>
<pre><code class="language-python">pf[5,0][0]</code></pre>
<pre><code>'D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_KGnibGL9KFBpcvjV.jpg'</code></pre>
<pre><code class="language-python">file_name_list=[]
for i in range(1000):
file_name_list.append(pf[i,0][0][52:])</code></pre>
<pre><code class="language-python">file_name_list</code></pre>
<pre><code> ['MWI_KD96UmGQ6KWVeohF.jpg',
'MWI_KDDTIKoSiQgwZiUQ.jpg',
'MWI_KERiJD55HvBKIhmL.jpg',
'MWI_KF9KpqQNNMVSsUKH.jpg',
'MWI_KFSnYaR40ro5CsiG.jpg',
'MWI_KGnibGL9KFBpcvjV.jpg',
..........
'MWI_s8Jos2wcHcmTqMJ6.jpg',
'MWI_s8cXI99nJNDiXXNc.jpg',
'MWI_s8gMVZYugHHFmUyg.jpg']</code></pre>
<pre><code class="language-python">cell_pre[0][0][0]</code></pre>
<pre><code>'DESERT'</code></pre>
<pre><code class="language-python">pre_name_list=[]
for i in range(1000):
pre_name_list.append(cell_pre[i][0][0])</code></pre>
<pre><code class="language-python">pre_name_list</code></pre>
<pre><code>['DESERT',
'OCEAN',
'MOUNTAIN',
'MOUNTAIN',
.....
'DESERT',
'FARMLAND',
'OCEAN',
'LAKE',
'DESERT']</code></pre>
<p>截取字符串</p>
<pre><code class="language-python">a ='D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\'
len(a)</code></pre>
<pre><code>52</code></pre>
<pre><code class="language-python">pf[5,0][0][52:]</code></pre>
<pre><code>'MWI_KGnibGL9KFBpcvjV.jpg'</code></pre>
<hr />
<hr />
<hr />
<pre><code class="language-python">import pandas as pd
from pandas import Series,DataFrame</code></pre>
<pre><code class="language-python">frame1 = DataFrame(file_name_list)</code></pre>
<pre><code class="language-python">frame1</code></pre>
<pre><code><tr>
<th>0</th>
<td>MWI_KD96UmGQ6KWVeohF.jpg</td>
</tr>
<tr>
<th>1</th>
<td>MWI_KDDTIKoSiQgwZiUQ.jpg</td>
</tr>
<tr>
<th>2</th>
<td>MWI_KERiJD55HvBKIhmL.jpg</td>
</tr>
<tr>
<th>3</th>
<td>MWI_KF9KpqQNNMVSsUKH.jpg</td>
</tr>
<t
<th>10</th>
<td>MWI_KKYSmFI1jRRVBfXF.jpg</td>
</tr>
<tr>
<th>11</th>
<th>992</th>
<td>MWI_s2l0T218Fq2lqtFS.jpg</td>
</tr>
<tr>
<th>993</th>
<td>MWI_s4BD8Ud5Gu91QwMl.jpg</td>
</tr>
<tr>
<th>994</th>
<td>MWI_s5GwvnKVlu0HDG2h.jpg</td>
</tr>
<tr>
<th>995</th>
<td>MWI_s5r88rnOtm3KriE1.jpg</td>
</tr>
<tr>
<th>996</th>
<td>MWI_s7JexiPnRc2cTiyZ.jpg</td>
</tr>
<tr>
<th>997</th>
<td>MWI_s8Jos2wcHcmTqMJ6.jpg</td>
</tr>
<tr>
<th>998</th>
<td>MWI_s8cXI99nJNDiXXNc.jpg</td>
</tr>
<tr>
<th>999</th>
<td>MWI_s8gMVZYugHHFmUyg.jpg</td>
</tr></code></pre>
<p>1000 rows × 1 columns</p>
<pre><code class="language-python">frame1_name = DataFrame(pre_name_list)</code></pre>
<pre><code class="language-python">frame1_name</code></pre>
<pre><code><tr>
<th>0</th>
<td>DESERT</td>
</tr>
<tr>
<th>1</th>
<td>OCEAN</td>
</tr>
<tr>
<th>2</th>
<td>MOUNTAIN</td>
</tr>
<tr>
<th>3</th>
<td>MOUNTAIN</td>
</tr>
<tr>
<th>4</th>
<td>LAKE</td>
</tr>
<tr>
<th>5</th>
<td>DESERT</td>
</tr>
<tr>
<th>6</th>
<td>LAKE</td>
</tr>
<tr>
<th>7</th>
<td>LAKE</td>
</tr>
<tr>
<th>979</th>
<td>OCEAN</td>
</tr>
<tr>
<th>980</th>
<td>DESERT</td>
</tr>
<tr>
<th>981</th></code></pre>
<p>1000 rows × 1 columns</p>
<p>两列合并</p>
<pre><code class="language-python">result = pd.concat([frame1,frame1_name],axis=1)
result</code></pre>
<p>保存到csv文件</p>
<pre><code class="language-python">result.to_csv("save_data.csv", index =False)</code></pre>
<p>不保存首行</p>
<pre><code class="language-python">result.to_csv("save_data2.csv", index =False,header=0)</code></pre>