QRec使用指南


Add New Model

<p>[TOC]</p> <h3>Example: add CML model</h3> <p>We will use this example to show how a new model is added to QRec.</p> <hr /> <h3>Step 1. Prepare configuration file</h3> <p><strong>Filename:</strong> <code>CML.conf</code>, and then add it to the directory named config.</p> <p><strong>Content:</strong> (We refer you to the conf files in the directory of config to know the parameters and options.)</p> <pre><code>ratings=./dataset/FilmTrust/trainset.txt ratings.setup=-columns 0 1 2 model.name=CML evaluation.setup=-testSet ../dataset/FilmTrust/testset.txt item.ranking=on -topN 5 num.factors=100 num.max.iter=500 batch_size=500 learnRate=-init 0.001 -max 1 reg.lambda=-u 0.001 -i 0.001 output.setup=on -dir ./results/ CML=-learning_rate 0.1 -margin 0.5 -norm_clip_value 1</code></pre> <ul> <li>Line 1 specifies the file of the training set. </li> <li>Line 2 denotes the format of the training set file (0: user, 1: item, 2: rating/click).</li> <li>Line 3 indicates the name of the added model.</li> <li>Line 4 decides how the model is evaluated (e.g. mannually give a test set or cross-validation).</li> <li>Line 5 decides either rating prediction or item ranking is performed.</li> <li>Line 6 shows the dimension of latent factors.</li> <li>Line 7 shows the maximum iteration used when training the model.</li> <li>Line 8 denotes the batch size used in tensorflow supported models.</li> <li>Line 9 indicates the initial learning rate used in model training; -max denotes the maximum learning rate (only applicable to show models implemented with numpy)</li> <li>Line 10 decides if and where the results will be output.</li> <li>Line 11 lists some private parameters of the added model.</li> </ul> <hr /> <h3>Step 2. Write .py file</h3> <p><strong>Filename:</strong> <code>CML.py</code>, and then add it to the directory of model/ranking (CML is a model for item ranking).</p> <p><strong>Reimplement the functions in the base classes if the added model has its own implementation.</strong></p> <p>Function can be reimplemented:</p> <ul> <li>readConfiguration()</li> <li>printAlgorConfig()</li> <li>initModel()</li> <li>buildModel()</li> <li>saveModel()</li> <li>loadModel()</li> <li>predict()</li> <li>redictForRanking()</li> </ul> <p>For example:</p> <pre><code class="language-python">def readConfiguration(self): super(CML,self).readConfiguration() args = config.LineConfig(self.config['CML']) self.learning_rate = float(args['-learning_rate']) self.margin = float(args['-margin']) self.norm_clip_value = int(args['-norm_clip_value']) ...... def buildModel(self): ...... self.P = sess.run(self.U) self.Q = sess.run(self.V) ...... def predictForRanking(self, u): if self.data.containsUser(u): u = self.data.getUserId(u) return self.Q.dot(self.P[u]) else: return [self.data.globalMean] * self.num_items</code></pre> <hr /> <h3>Step 3. Register new model in main.py</h3> <p>Modify main.py to add CML</p> <pre><code class="language-python">if __name__ == '__main__': ... print('Deep Recommenders:') print('d1. APR d2. CDAE d3. DMF d4. NeuMF d5. CFGAN') ... print( 'd14. CML') ... ... algorithms = {..., 'd14':'CML', ...}</code></pre> <p>Then you can run CML by entering d14.</p> <hr /> <h3>Appendix</h3> <p>The full implementation of CML</p> <pre><code class="language-python">from base.deepRecommender import DeepRecommender import numpy as np import random import tensorflow as tf from util import config class CML(DeepRecommender): def __init__(self,conf,trainingSet=None,testSet=None,fold='[1]'): super(CML, self).__init__(conf,trainingSet,testSet,fold) def readConfiguration(self): super(CML,self).readConfiguration() args = config.LineConfig(self.config['CML']) self.learning_rate = float(args['-learning_rate']) self.margin = float(args['-margin']) self.norm_clip_value = int(args['-norm_clip_value']) def initModel(self): self.user_id = tf.placeholder(dtype=tf.int32, shape=[None], name='user_id') self.item_id = tf.placeholder(dtype=tf.int32, shape=[None], name='item_id') self.neg_item_id = tf.placeholder(dtype=tf.int32, shape=[None], name='neg_item_id') self.keep_rate = tf.placeholder(tf.float32) self.U = tf.Variable(tf.random_normal([len(self.data.trainSet_u), self.k], stddev=1 / (self.k ** 0.5)), dtype=tf.float32, name='u') self.V = tf.Variable(tf.random_normal([len(self.data.trainSet_i), self.k], stddev=1 / (self.k ** 0.5)), dtype=tf.float32, name='v') self.user_embedding = tf.nn.embedding_lookup(self.user_embeddings, self.user_id) self.item_embedding = tf.nn.embedding_lookup(self.item_embeddings, self.item_id) self.neg_item_embedding = tf.nn.embedding_lookup(self.item_embeddings, self.neg_item_id) self.pred_distance = tf.reduce_sum( tf.nn.dropout(tf.squared_difference(self.user_embedding, self.item_embedding), self.keep_rate), 1) self.pred_distance_neg = tf.reduce_sum( tf.nn.dropout(tf.squared_difference(self.user_embedding, self.neg_item_embedding), self.keep_rate), 1) self.loss = tf.reduce_sum(tf.maximum(self.pred_distance - self.pred_distance_neg + self.margin, 0)) self.optimizer = tf.train.AdagradOptimizer(self.learning_rate).minimize(self.loss, var_list=[self.U, self.V]) self.clip_U = tf.assign(self.U, tf.clip_by_norm(self.U, self.norm_clip_value, axes=[1])) self.clip_V = tf.assign(self.V, tf.clip_by_norm(self.V, self.norm_clip_value, axes=[1])) def buildModel(self): # train print('traing...') iteration = 0 with tf.Session() as sess: init = tf.global_variables_initializer() sess.run(init) for iteration in range(self.maxIter): for n, batch in enumerate(self.next_batch_pairwise()): user_idx, item_idx, neg_item_idx = self.next_batch_pairwise() _, loss, _, _ = sess.run((self.optimizer, self.loss, self.clip_U, self.clip_V), feed_dict={self.user_id: user_idx, self.item_id: item_idx, self.neg_item_id: neg_item_idx, self.keep_rate: 0.98}) print('iteration:', epoch, 'loss:', loss) self.P = sess.run(self.user_embeddings) self.Q = sess.run(self.item_embeddings) def predictForRanking(self, u): 'invoked to rank all the items for the user' if self.data.containsUser(u): u = self.data.getUserId(u) return self.Q.dot(self.P[u]) else: return [self.data.globalMean] * self.num_items</code></pre>

页面列表

ITEM_HTML