python


4、百度天宫图片分类大赛

<pre><code class="language-python">import pandas as pd import numpy as np</code></pre> <pre><code class="language-python">df = pd.DataFrame(pd.read_csv('./data/micro_height_data_lable.csv',header=0))</code></pre> <pre><code class="language-python">df.head()</code></pre> <pre><code class="language-python">import os import shutil</code></pre> <p>创建6个分类的文件夹,便于matlab中导入数据方便</p> <pre><code class="language-python">os.mkdir('OCEAN') os.mkdir('MOUNTAIN') os.mkdir('DESERT') os.mkdir('LAKE') os.mkdir('FARMLAND') os.mkdir('CITY')</code></pre> <pre><code class="language-python">shutil.move("./data/multi_test_data_lable.csv","./data/CITY") # 移动文件或目录</code></pre> <pre><code>'out: ./data/CITY\\multi_test_data_lable.csv'</code></pre> <pre><code class="language-python"> # 照片目录 file_dir="D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\multi_test_data" # 读入标签数据 df = pd.DataFrame(pd.read_csv('./data/multi_test_data_lable.csv',header=0)) # 获取lables 和img_name lables = df['lables'] img_name = df['img_name'] # 遍历文件夹内所有图片,读取图片存到img,遍历到名字,在img_name中查找 # 查找到后,获取当前行数,移动文件到对应分类文件夹 for file in os.listdir(file_dir): #print(file) #img_path=file_dir+'\\'+file #每个图片的地址 # img=Image.open(img_path) i = 0 for img in img_name: if (file == img): shutil.move(file_dir+'\\'+img,"./data/"+lables[i]) break else: i = i+1 </code></pre> <hr /> <hr /> <pre><code class="language-python">import numpy as np from PIL import Image</code></pre> <pre><code class="language-python">def read_image(img_name): im = Image.open(img_name) #.convert('L') data = np.array(im) return data</code></pre> <pre><code class="language-python">import os images=[] # 照片目录 file_dir="D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data" for file in os.listdir(file_dir): img_path=file_dir+'\\'+file #每个图片的地址 images.append(read_image(img_path))</code></pre> <p>查找目录内不是(256,256,3)类型的图片</p> <pre><code class="language-python">j = 0 for i in images: #print(len(i)) if(i.shape==(256, 256, 3)): j = j+1 else: print(j) j = j+1</code></pre> <pre><code class="language-python">X_img = np.array(images)</code></pre> <pre><code> ValueError Traceback (most recent call last) &lt;ipython-input-15-0f8e6620abfa&gt; in &lt;module&gt;() ----&gt; 1 X_img = np.array(images) ValueError: could not broadcast input array from shape (256,256,3) into shape (256)</code></pre> <p>将matlab中 mat类型变量导入到python中 matlab变量为cell类型的array</p> <pre><code class="language-python">import numpy as np import pandas as pd import scipy.io as sio</code></pre> <pre><code class="language-python">pfile = sio.loadmat('./pfile.mat')</code></pre> <pre><code class="language-python">pfile</code></pre> <pre><code> {'__globals__': [], '__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Nov 5 16:59:22 2018', '__version__': '1.0', 'pfiles': array([[array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_KD96UmGQ6KWVeohF.jpg'], dtype='&lt;U76')], [array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_KDDTIKoSiQgwZiUQ.jpg'], dtype='&lt;U76')], ............... [array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_s8cXI99nJNDiXXNc.jpg'], dtype='&lt;U76')], [array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_s8gMVZYugHHFmUyg.jpg'], dtype='&lt;U76')]], dtype=object)} </code></pre> <pre><code class="language-python">pf = pfile['pfiles']</code></pre> <pre><code class="language-python">pf[0,0]</code></pre> <pre><code>array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_KD96UmGQ6KWVeohF.jpg'], dtype='&lt;U76')</code></pre> <pre><code class="language-python">pPre = sio.loadmat('./cell_str.mat')</code></pre> <pre><code class="language-python">pPre</code></pre> <pre><code> {'__globals__': [], '__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Nov 5 17:12:21 2018', '__version__': '1.0', 'cell_pre': array([[array(['DESERT'], dtype='&lt;U6')], [array(['OCEAN'], dtype='&lt;U5')], [array(['MOUNTAIN'], dtype='&lt;U8')], [array(['MOUNTAIN'], dtype='&lt;U8')], [array(['LAKE'], dtype='&lt;U4')], [array(['DESERT'], dtype='&lt;U6')], [array(['LAKE'], dtype='&lt;U4')], [array(['LAKE'], dtype='&lt;U4')], [array(['DESERT'], dtype='&lt;U6')], [array(['DESERT'], dtype='&lt;U6')], ........... [array(['OCEAN'], dtype='&lt;U5')], [array(['LAKE'], dtype='&lt;U4')], [array(['DESERT'], dtype='&lt;U6')]], dtype=object)}</code></pre> <pre><code class="language-python">cell_pre = pPre['cell_pre']</code></pre> <pre><code class="language-python">cell_pre[0]</code></pre> <pre><code>array([array(['DESERT'], dtype='&lt;U6')], dtype=object)</code></pre> <pre><code class="language-python">pf[0,0]</code></pre> <pre><code>array(['D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_KD96UmGQ6KWVeohF.jpg'], dtype='&lt;U76')</code></pre> <pre><code class="language-python">pf[5,0][0]</code></pre> <pre><code>'D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\MWI_KGnibGL9KFBpcvjV.jpg'</code></pre> <pre><code class="language-python">file_name_list=[] for i in range(1000): file_name_list.append(pf[i,0][0][52:])</code></pre> <pre><code class="language-python">file_name_list</code></pre> <pre><code> ['MWI_KD96UmGQ6KWVeohF.jpg', 'MWI_KDDTIKoSiQgwZiUQ.jpg', 'MWI_KERiJD55HvBKIhmL.jpg', 'MWI_KF9KpqQNNMVSsUKH.jpg', 'MWI_KFSnYaR40ro5CsiG.jpg', 'MWI_KGnibGL9KFBpcvjV.jpg', .......... 'MWI_s8Jos2wcHcmTqMJ6.jpg', 'MWI_s8cXI99nJNDiXXNc.jpg', 'MWI_s8gMVZYugHHFmUyg.jpg']</code></pre> <pre><code class="language-python">cell_pre[0][0][0]</code></pre> <pre><code>'DESERT'</code></pre> <pre><code class="language-python">pre_name_list=[] for i in range(1000): pre_name_list.append(cell_pre[i][0][0])</code></pre> <pre><code class="language-python">pre_name_list</code></pre> <pre><code>['DESERT', 'OCEAN', 'MOUNTAIN', 'MOUNTAIN', ..... 'DESERT', 'FARMLAND', 'OCEAN', 'LAKE', 'DESERT']</code></pre> <p>截取字符串</p> <pre><code class="language-python">a ='D:\\work\\jupyterNotebook\\baidu_dianshi\\data\\pre_data\\' len(a)</code></pre> <pre><code>52</code></pre> <pre><code class="language-python">pf[5,0][0][52:]</code></pre> <pre><code>'MWI_KGnibGL9KFBpcvjV.jpg'</code></pre> <hr /> <hr /> <hr /> <pre><code class="language-python">import pandas as pd from pandas import Series,DataFrame</code></pre> <pre><code class="language-python">frame1 = DataFrame(file_name_list)</code></pre> <pre><code class="language-python">frame1</code></pre> <pre><code>&lt;tr&gt; &lt;th&gt;0&lt;/th&gt; &lt;td&gt;MWI_KD96UmGQ6KWVeohF.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;1&lt;/th&gt; &lt;td&gt;MWI_KDDTIKoSiQgwZiUQ.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;2&lt;/th&gt; &lt;td&gt;MWI_KERiJD55HvBKIhmL.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;3&lt;/th&gt; &lt;td&gt;MWI_KF9KpqQNNMVSsUKH.jpg&lt;/td&gt; &lt;/tr&gt; &lt;t &lt;th&gt;10&lt;/th&gt; &lt;td&gt;MWI_KKYSmFI1jRRVBfXF.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;11&lt;/th&gt; &lt;th&gt;992&lt;/th&gt; &lt;td&gt;MWI_s2l0T218Fq2lqtFS.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;993&lt;/th&gt; &lt;td&gt;MWI_s4BD8Ud5Gu91QwMl.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;994&lt;/th&gt; &lt;td&gt;MWI_s5GwvnKVlu0HDG2h.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;995&lt;/th&gt; &lt;td&gt;MWI_s5r88rnOtm3KriE1.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;996&lt;/th&gt; &lt;td&gt;MWI_s7JexiPnRc2cTiyZ.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;997&lt;/th&gt; &lt;td&gt;MWI_s8Jos2wcHcmTqMJ6.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;998&lt;/th&gt; &lt;td&gt;MWI_s8cXI99nJNDiXXNc.jpg&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;999&lt;/th&gt; &lt;td&gt;MWI_s8gMVZYugHHFmUyg.jpg&lt;/td&gt; &lt;/tr&gt;</code></pre> <p>1000 rows × 1 columns</p> <pre><code class="language-python">frame1_name = DataFrame(pre_name_list)</code></pre> <pre><code class="language-python">frame1_name</code></pre> <pre><code>&lt;tr&gt; &lt;th&gt;0&lt;/th&gt; &lt;td&gt;DESERT&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;1&lt;/th&gt; &lt;td&gt;OCEAN&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;2&lt;/th&gt; &lt;td&gt;MOUNTAIN&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;3&lt;/th&gt; &lt;td&gt;MOUNTAIN&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;4&lt;/th&gt; &lt;td&gt;LAKE&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;5&lt;/th&gt; &lt;td&gt;DESERT&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;6&lt;/th&gt; &lt;td&gt;LAKE&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;7&lt;/th&gt; &lt;td&gt;LAKE&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;979&lt;/th&gt; &lt;td&gt;OCEAN&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;980&lt;/th&gt; &lt;td&gt;DESERT&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;981&lt;/th&gt;</code></pre> <p>1000 rows × 1 columns</p> <p>两列合并</p> <pre><code class="language-python">result = pd.concat([frame1,frame1_name],axis=1) result</code></pre> <p>保存到csv文件</p> <pre><code class="language-python">result.to_csv("save_data.csv", index =False)</code></pre> <p>不保存首行</p> <pre><code class="language-python">result.to_csv("save_data2.csv", index =False,header=0)</code></pre>

页面列表

ITEM_HTML