python


9、sicoo

<pre><code class="language-python">from selenium import webdriver browser = webdriver.Chrome()</code></pre> <pre><code class="language-python">browser.get('http://search.secoo.com/search?keyword=Gucci&amp;firstcategoryid=30&amp;secondcategoryid=0&amp;thirdcategoryid=0&amp;brandId=0&amp;level=0&amp;orderType=1&amp;filterType=0&amp;source=&amp;pageNo=1&amp;st=10&amp;price=0&amp;prop=0&amp;warehouse=100&amp;actscr=0&amp;expKey=#J_Filter')</code></pre> <pre><code class="language-python">next_page = browser.find_element_by_class_name('next')</code></pre> <pre><code class="language-python">next_page = browser.find_element_by_class_name('next') next_page.click()</code></pre> <pre><code class="language-python">page_html = browser.page_source</code></pre> <pre><code class="language-python">from bs4 import BeautifulSoup import urllib.request # import pandas as pd import ssl import time import random import xlsxwriter import re import json import os import pickle import socket import sys from functools import partial from multiprocessing import Pool</code></pre> <pre><code class="language-python">Soup = BeautifulSoup(page_html, 'lxml')</code></pre> <pre><code class="language-python">product_titles = Soup.find_all(class_="dl_name") product_show_tips = Soup.find_all(class_="show_tips") product_prices = Soup.find_all(class_="dl_price clearfix") product_img_url = product_show_tips[0].dt.img['src'] product_url = product_titles[0].a['href'] product_title = product_titles[0].a['title'] product_price = product_prices[0].text[1:]</code></pre> <pre><code class="language-python">len(product_titles) len(product_show_tips) len(product_prices)</code></pre> <pre><code>40</code></pre> <pre><code class="language-python">len(product_titles)</code></pre> <pre><code>40</code></pre> <pre><code class="language-python">len(product_show_tips)</code></pre> <pre><code>40</code></pre> <pre><code class="language-python">product_titles[0]</code></pre> <pre><code>&lt;dd class="dl_name"&gt; &lt;a href="http://item.secoo.com/42441711.shtml?source=search" id="name_42441711" onclick="analytical('搜索页18','商品');" target="_blank" title="GUCCI/古驰中号Dionysus女士棕色帆布驼色麂皮链条单肩包棕色棕色"&gt;GUCCI/&lt;em&gt;古&lt;/em&gt;&lt;em&gt;驰&lt;/em&gt;中号Dionysus女士棕色帆布驼色麂皮链条单肩包棕色棕色&lt;span class="subtitle"&gt;&lt;/span&gt;&lt;/a&gt;&lt;/dd&gt;</code></pre> <pre><code class="language-python">product_titles[0].a['href']</code></pre> <pre><code>'http://item.secoo.com/42441711.shtml?source=search'</code></pre> <pre><code class="language-python">product_titles[0].a['title']</code></pre> <pre><code>'GUCCI/古驰中号Dionysus女士棕色帆布驼色麂皮链条单肩包棕色棕色'</code></pre> <pre><code class="language-python">product_mini_nav = Soup.find_all(class_="mini_nav")</code></pre> <pre><code class="language-python">product_show_tips = Soup.find_all(class_="show_tips") product_img_url = product_show_tips[0].dt.img['src']</code></pre> <pre><code class="language-python">product_show_tips[0].dt</code></pre> <pre><code>&lt;dt data="" id="propic_42441711"&gt; &lt;a href="http://item.secoo.com/42441711.shtml?source=search" onclick="analytical('搜索页18','商品');" target="_blank" title="GUCCI/古驰中号Dionysus女士棕色帆布驼色麂皮链条单肩包棕色棕色"&gt;&lt;img alt="GUCCI/古驰中号Dionysus女士棕色帆布驼色麂皮链条单肩包棕色棕色图片" data-original="http://pic11.secooimg.com/product/240/240/55/54/76d45e9bc70e4ec380154e25124894df.jpg" height="240" src="http://pic11.secooimg.com/product/240/240/55/54/76d45e9bc70e4ec380154e25124894df.jpg" style="display: inline;" width="240"/&gt;&lt;/a&gt;&lt;/dt&gt;</code></pre> <pre><code class="language-python">product_show_tips[0].dt.img['src']</code></pre> <pre><code>'http://pic11.secooimg.com/product/240/240/55/54/76d45e9bc70e4ec380154e25124894df.jpg'</code></pre> <pre><code class="language-python">product_show_tips = Soup.find_all(class_="dl_price clearfix") product_show_tips[0]</code></pre> <pre><code>&lt;dd class="dl_price clearfix"&gt; &lt;span id="secoo_price_42441711"&gt;&lt;i&gt;¥&lt;/i&gt;16860&lt;/span&gt;&lt;/dd&gt;</code></pre> <pre><code class="language-python">product_show_tips[0].text</code></pre> <pre><code>'\n¥16860'</code></pre> <pre><code class="language-python"></code></pre> <pre><code>'16860'</code></pre>

页面列表

ITEM_HTML