Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Clustering algorithms
- Date: Thu, 14 Feb 2008 23:50:59 +0900
- From: "Dimitar Dimitrov" <dimitar.dimitrov@example.com>
- Subject: Re: [tlug] Clustering algorithms
Hello Dave, I am not an expert, still here are some ideas. Take them for what they are worth. It all starts with the data you collect. Let's say that for each session you remember the products a user has seen and the products they bought. We will define a session as "repeated visits from the same IP terminated by a 30 minutes pause". Suppose you have a table with columns TIMESTAMP, SESSION_ID, ITEM and ACTION. Where session ID is unique identifier for the session (say a counter) and ACTION would be an enumerated type having values like 'view', 'add', 'remove', 'buy', etc. Note that I don't suggest that you use the user's IP or account details. If you keep such information, please, please make sure that you put special access restrictions to these tables, use hashing where appropriate and purge as much as possible to an offline backup storage. One algorithm utilizing this model would be to create a graph with products as nodes and weighted edges connecting nodes in the same session. Reading from the database, you can increment the weight when the particular edge is present in the session. Finally, you go through all the nodes and leave the top 10 edges and here is your list of recommended products. This graph can be precalculated (say once a week) and kept in a database for querying. You can tweak the algorithm by changing the weighting: 1. Depending on the distance - if product views are closer on the timeline in the session use heavier weight or do not add edges if the product views are more than 3 pages apart. 2. Depending on the action - view adds weight x1, add - x3 and purchase x10 3. You can build separate graph for each operation and have more complex queries Given that you don't have any objective criteria for correctness, it shouldn't be too difficult to get something that would be good enough. Another possible algorithm could be using Markov chains to put an emphasis on the sequence of visiting. You can ialso nvestigate using RDF + SPARQL for storing the data directly in graph form and real-time querying or a Prolog-type language, where you can model the graph in the language syntax itself and it's easier to implement certain types of queries. Cheers and good luck, Dimitar
- Follow-Ups:
- Re: [tlug] Clustering algorithms
- From: Dave M G
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Clustering algorithms
- Next by Date: Re: [tlug] Clustering algorithms
- Previous by thread: Re: [tlug] Clustering algorithms
- Next by thread: Re: [tlug] Clustering algorithms
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links