Home linkSourcelight Business linkProducts linkTechnology linkAbout Us link street photo
Client Sign In

TECHNOLOGY

Recommendation Technology

Sourcelight’s methodologies thrive on real-world data. The Discovery Guide is based on the science of Pattern Recognition, which identifies highly predictive data patterns found in increasingly complex data streams. The Company’s core predictive modeling technology originated from the founders’ extensive experience with data analysis for recognition, chess programs, and credit card fraud detection. Databases for these applications are often extremely large, and as a result, traditional Pattern Recognition methods tend to be both inefficient and inaccurate. A unique combination of existing and new approaches developed by Sourcelight has proven to be much faster and yields far more accurate results than any simple refinement of established techniques.

How It Works
Sourcelight’s Pattern Recognition technology combines cutting-edge research with over a decade of real-world experience. Recommendation domains exhibit different characteristics such as rating scarcity, database size, and taste complexity. Sourcelight’s Discovery Guide architecture features a “Plug-and-Play” recommendation model, incorporating models fine-tuned with the help of domain experts. Sourcelight’s library of recommendation models includes:

Item-based models - Item models generate a prediction for a target item based on a user's ratings for related items. Standard “item-item” models suffer from an unreliable confidence measure and large space requirements. Sourcelight overcomes these problems by using a regression-based model with accurate Bayesian and heuristic confidence metrics. Item models are pruned to optimize memory and CPU performance.

Neighborhood-based user models - User neighborhood models generate a prediction for an item based on ratings by users who are similar. “User-user” models extract complex, non-linear patterns from a data set. Sourcelight’s user models feature dynamic neighborhood sizing to minimize the expected error, and a novel distance metric that accounts for both seen and unseen items.

Cluster-based user models - Clustering models pre-compute groups of similar users. When generating a prediction for a user, the user is grouped in the cluster closest to them, and aggregate information for the closest cluster is used to generate predictions. Sourcelight incorporates hierarchical clustering models which can mine huge data sets and generate predictions extremely efficiently. The clustering process employs non-linear ratings normalization to eliminate noise.

Sourcelight’s Discovery Guide also employs a novel agent-based method for collecting ratings from users. When asking a user to rate items, the engine balances several goals:

• Asking about items that help improve predictions for a user.

• Asking about items that uncover new aspects of a user’s tastes.

• Asking about items that the user will know and enjoy rating.

• Asking about items that don’t have many ratings in order to improve the overall recommendation model.

Sophisticated Data Processing
Sourcelight has developed several ways to deal with the common shortcomings present in a large, commercial stream of data:
High-dimensional data sets - Traditional data-mining systems do not deal well with the high-dimensional data sets that are needed to fully characterize products in the real world. The Sourcelight Discovery Guide's approach to variable reduction solves this problem by processing data to a lower dimensional form while preserving the information needed for an accurate description.

Missing data - Missing data is a very common problem, especially in high dimensions. The Discovery Guide uses sophisticated methods to adapt to missing data.

Noisy data - Noise in the database often corrupts the data analysis. Sourcelight has a wealth of experience in filtering and analyzing large databases to maintain accurate outcomes.

Established and Proprietary Techniques
Different approaches to Pattern Recognition, such as regression analysis, nearest-neighbor algorithms, and clustering algorithms, have their unique strengths and weaknesses. Different types of data, such as customer preferences, transaction histories, and click-stream information, also have varied benefits. Sourcelight addresses the differences of these approaches and data types by using stacked generalization, which combines and arbitrates results from various recommendation technologies, including established techniques and proprietary prediction methods. In developing a Pattern Recognition method, Sourcelight uses standard hold-out samples to measure accuracy. In real-time, the Discovery Guide delivers a highly confident and accurate prediction to each consumer.

As the Discovery Guide learns, it becomes progressively more confident and accurate. This means that the Discovery Guide continuously improves with ever-increasing amounts of data. The result is a more efficient and accurate predictive modeling engine that dynamically adapts to match the characteristics of the database it is analyzing.

Exceptional Scalability and Performance
The Sourcelight Discovery Guide is designed to be extremely scalable and adaptable. Its n-tier architecture supports parallel processing implementations. The Discovery Guide can utilize standard, off-the-shelf databases, as well as proprietary databases if additional speed is required. Integration with outside systems can be accomplished through simple, open protocols such as XML, straight ASCII and TCP/IP. The use of open standards allows the Discovery Guide to interface easily with any programming language.

 

 

Home | Sourcelight Business | Products | Technology | About Us
Discovery Guide | Discovery Guide for Movies | Movie Database and Merchandising

Privacy Statement

© 2008 Sourcelight Technologies, Inc.