Recommender systems typically provide a number of recommendations in two ways through content-based filtering or collaborative filtering. Collaborative filtering recommendation model is based on user's past purchases and their ratings given to the purchased products and similar decisions made by other customers. This model is then used to predict items or products from the catalog that the user may be interested in. Content-based filtering approaches are more specific to items it utilizes a series of discrete characteristics of an item present in the catalog, in order to recommend additional items with similar characteristics. When both of these approaches are combined the resultant approach is Hybrid Recommender Systems.

**A Short Answer**[]

We are living in a world of innovations which has made humans lives easier. The technology advancements have led us to become more digitalized. Gone are those days where we have to plan for days to go for shopping and standing in front of a theater for tickets nor waiting for your friends or relatives to suggest you about any product. We don’t rely anymore on the word of mouth communication, these days everything is quantified with numbers and statistics. It symbolizes a remarkable story of technological innovation where literally everything is available with just a touch of your finger. Everything is online now e-shopping started way back in 1979 and today it has grown humungous and electronic commerce is one of the major contributors to the world economy as well as our go to option to fulfill our ever demanding needs. In the internet world due to the technological advancements, there has been a huge inflow of the data and numerous products have been added to the catalog which extends our horizon in selecting the products. This is when recommender systems play a major role in recommending products to everyone. Recommender systems apply statistical and mathematical knowledge techniques while making product recommendations during a customer interaction online and they have achieved a great success in E-Commerce nowadays.

Recommender systems, like many other search systems, have two types of errors: false negatives, which are merchandises that are not recommended to the customers though they like them, and false positives, which are merchandises that are recommended to the customer, though they do not like them. In the E-commerce domain, the most significant errors to avoid are false positives, because these errors will angry customers, and since there are usually many merchandises products on an E-commerce site that a consumer will always like to purchase, so there is no valid reason to risk recommending one customer which they will not like. For some reason these two challenges conflict with one another, since the less time an algorithm spends looking for the right neighbors, the more scalable it can be, and the worse its quality of recommendation. For this reason, it is very much important to treat these two challenges together so the solutions used are both useful and practical.

Recommender systems typically provide a number of recommendations in two ways through content-based filtering or collaborative filtering. Collaborative filtering recommendation model is based on user's past purchases and their ratings given to the purchased products and similar decisions made by other customers. This model is then used to predict items or products from the catalog that the user may be interested in. Content-based filtering approaches are more specific to items it utilizes a series of discrete characteristics of an item present in the catalog, in order to recommend additional items with similar characteristics. When both of these approaches are combined the resultant approach is Hybrid Recommender Systems.

Collaborative filtering is widely used recommender system. It is based on analyzing a large amount of data on users’ interest, activities or choices and then predicting what users might like based on their similar interests as that of other users. A key advantage of using collaborative filtering design is that it does not rely on any machine analyzable content and thus, it is capable of recommending complex items accurately thus design can be used in sectors like movies without requiring an understanding of the item itself. This method is based on the assumption that users who agreed to the past recommendation will agree for the future/current recommendation. Collaborative filtering is classified into user based and item based as shown in Fig 1.

**Figure 1:** Recommendation algorithm classification [1]

Haven said that “Man is by nature a social animal; an individual who is unsocial naturally and not accidentally is either beneath our notice or more than human. Society is something that precedes the individual. Anyone who either cannot lead the common life or is so self-sufficient as not to need to and therefore does not partake of society is either a beast or a god” as stated by Aristotle [1]. Recommender system builds our social networks or social relations among people who share similar interests, activities, and backgrounds. Social networking sites allow users to share ideas, pictures, posts, activities, events, and interests with people in their network. The most common algorithm used in the Social networking is Temporal Context-Aware Mixture Model.

**A Long Answer**[]

**Content- Based Filtering**[]

Content-based filtering systems are based on the profiles of users that are created at the beginning. The profiles which are created have information about the users and their likings. Taste is based on how the users have rated items in the past. In this recommendation process, the engine compares the items that were positively rated by the user with the items he poorly rated and looks for similarities. Those items will be recommended to the user that are mostly similar to the positively rated ones.

**Collaborative Filtering**[]

It has been a very successful method in both research and practice but there is some important research to be done in overcoming two basic challenges for collaborative filtering recommender systems. The foremost challenge is to improve the scalability of the collaborative algorithms. These algorithms filter tens of thousands of possible neighbors in real-time, but the rising demands of modern E-commerce systems are to search millions of possible neighbors. The current algorithms have performance issues with individual customers for whom the site has large amounts of information. The second challenge is to ensure the quality of the recommendations for the consumers. Consumers need recommendations which they can trust on and to help them to buy products they will like. Once a consumer trusts a recommender system and purchases a product. If he does not like the product the customer will be unlikely to use the recommender system in future.

**User-Based Collaborative Filtering: **

**Item -Based Collaborative Filtering:**

**Pure Item Based Algorithm**[]

The pure item based algorithm uses items as the vector of user rating. It basically looks into the items the user have rated, and select the k most similar items as ( i1,i2,….., ik). The similarity between the items is calculated by the various similarity metrics. The predictions are computed by looking into the most similar items. By taking the weighted average of user’s rating on the similar items, predictions are computed

**Item Similarity Computation**[]

The most important step in the item based algorithm is to compute the similarity between the items. The users who have rated the items I and j are isolated and then similarity computational techniques are applied to calculate the similarity between the items. Figure 1 depicts this process, matrix rows are used for the users and column represent items.

**Figure 4 **Item similarity matrix [1]

**Distance similarity metric: **In this distance of two item vectors is considered. If the distance between the item vectors is large then the similarity metric value will be small. The distance similarity metric is calculated as

**Cosine similarity metric: **In this similarity between the item vectors, is calculated by computing the cosine of an angle between these two item vectors. The cosine of two item vectors is the distance of vector’s direction. If the direction of two item vectors is same then a value of is 1, but if the direction is opposite then is 0. Figure 1 describes that similarity between items I and j is based on the m x n ratings. is calculated as

**Common similarity metric:**is the appropriate similarity metric that measures the user’s co-occurrence behavior on the given item pair. is calculated as

The final similarity metric is calculated as the product of these three metrics as

**Prediction Computation**[]

After calculating the similarity between the items, the next step is to compute the predictions for the items. There are two basic techniques for computing the predictions.

**Weighted Sum: **This method computes the prediction on item i for user u. Predictions are calculated by computing the ratings given by the user on the items similar to i. Each rating of the user is weighted by the similarity between the items I and j.

**Regression: **This method uses an approximation of ratings based on the regression model. The approximated values of ratings are used for the e predictions. The vector for the target item is denoted by and the vector for the similar items is denoted by . The linear model is represented as

Here ∈ is the error of the regression model. The parameters α b and β are calculated by the rating vectors.

**Nearest-Neighbor Collaborative filtering** []

**Nearest-Neighbor Collaborative filtering**

CF systems recommend products the customer based on the opinions of other the customers. These systems employ several statistical algorithms to filter a set of consumers known as neighbors, that have a history of agreeing with the target user they find this either by their ratings given to different products similarly or they tend to buy the similar set of products. Once a set of neighborhood users is formed, these systems use several different algorithms to produce recommendations. The entire process of Collaborative filtering is divided into three sub-tasks namely, representation, neighborhood formation, and recommendation generation as shown in below Figure 5. The demonstration mission deals with the scheme used to model the products that have already been bought by a customer. The neighborhood formation task emphases on the problem of how to recognize the other nearest-neighboring customers. The recommendation generation task emphases on the problem of detection the top N recommended goods from the neighborhood of customers. In the rest of the section, we describe some possible ways of performing these tasks.

**Figure 5**: Three main parts of a Recommender System[1]

**Proximity Measure**[]

The proximity between two customers is measured using either the cosine or the correlation measure.

**Correlation: **In this proximity between two users, a and b is measured by computing the Pearson correlation formula Sim (a, b), which is depicted as

**Cosine: **Here two customers a and b are assumed as two vectors in the n-dimensional product space or the k-dimensional space in case of condensed representation. The vicinity between them is identified by calculating the cosine of the angle between the two vectors, which is given by

**Different Neighborhood Types**[]

After computing the inclination between consumers, the next duty is to actually form the neighborhood set. There are various schemes for neighborhood formation. Here we discuss two schemes.

**Center-based scheme: **It forms a neighborhood of size l, for a particular
customer a, by simply selecting the c nearest other customers.

**Aggregate Neighborhood scheme: **It forms a neighborhood of size L, for a customer a, by picking the closest neighbor to a. Then the rest L-1 neighbors are selected as follows. Let, at a certain point, there are m neighbors in the neighborhood N, where m < l. The algorithm then calculates the centroid of the neighborhood. The centroid of N is defined as vector C and is computed. Basically, this type of algorithm allows the nearest neighbors to mark the formation of the neighborhood selection and it can be favorable for a huge data. The last step involved is Generation of Recommendation here the final step of a CF recommender system is to extract the Top-N recommendations from the neighborhood customers. There are two different techniques for performing the task.

**Most-frequent Item Recommendation: **It searches deep into the neighborhood N and for each neighbor it tests through his/her purchase data and executes a frequency
count of the products. After all the neighbors are searched for, the system rearranges the products according to their frequency count and simply returns the N most frequent products as the recommendation that have not been purchased by the current user.

**Association Rule-based Recommendation: **It is based on association rule-based top-N recommendation technique here instead of using the entire number of consumers to generate the set, this technique only considers only one neighbor while generating the set. By considering only a few number of neighbors may not generate strong enough association rule set, which as a consequence, may result in inadequate products to recommend. This can be improved by using a scheme where the rest of the products, if necessary, are evaluated by using the most frequent item algorithm. The formula used for prediction of recommendation is

**Hybrid Algorithm** []

**Hybrid Algorithm**

Hybrid recommender system combines the two techniques called collaborative filtering and content-based filtering in order to achieve the best recommendation results. Several studies compare the performance of the hybrid with the pure collaborative and content-based methods and depict that the hybrid methods can provide more precise recommendations than pure approaches. This hybrid approach can also be used to overcome some of the common problems in recommender systems such as cold start and the sparsity problem. Netflix is an example of the hybrid recommendation system. Hybrid Algorithm combines the similarity of two pure algorithms into one final hybrid similarity algorithm. Hybrid similarity metric is depicted as:

To improve the performance of the algorithm more precisely a new dynamic combination parameter came up. The recommendation for user u is based on the hybrid similarity with the above dynamic combination parameter.

**Case Study: Hybrid Algorithm**[]

The three datasets with different density levels from
the open** **Movie Lens are taken. Randomly 600 movies were
selected. These selected 600 movies were used to filter the original Movie Lens
dataset. At last, randomly 1200 users were selected** **for
each density level. For each dataset,
the items which are rated by less than 3 users and users who have rated less
than three movies were discarded.** **For
data set l, 3 rating for each user were selected to generate the test set. The
data set 2** **and data set 3, the number of ratings for the test set
are 5 and 10 respectively. Experimental
results of three different datasets for static and dynamic combination
parameter are shown below. The
performance of the datasets shows better results in a case of dynamic
combination parameter.

**Experimental Results:**[]

### * ***Traditional Data Mining: Association Rule** []

**Traditional Data Mining: Association Rule**

Data Mining can be called as Knowledge discovery in databases. This technique is used in the extraction of implicit but useful information from databases. Two main goals of this techniques are to save money by discovering the potential for efficiency in the business point of view or to make more money by inventing ways to sell more products to customers. For example, companies are using data mining technique to discover which products can sell well at which times of a year, so they can manage their retail store catalog more effectively which can potentially lead to saving millions of dollars a year. Some e-commerce companies are utilizing KDD to explore which customers will be interested in a special price offer, by cutting down the costs of direct mail or telephone campaigns by thousands of dollars a year.

By KDD techniques usage it helps in increasing sales of existing products in catalog by matching people interest to the merchandises they will be liking the most to purchase. In these systems, one of the best-used data mining techniques is the invention of association rules. The main goal of Traditional data mining is to find the association between two sets of goods in the database such that the presence of products in one subset which implies the presence of the products from the other -subset.

Let us denote a collection of n products {P1, P2,…………, Pn} by P. A transaction is defined to be a set of goods that are purchased together. An association rule between two sets of products A and B, such that A; B P and A B = φ this states that the presence of goods in the set A in the operation T shows a strong possibility that goods from the set B are also present in T. Such an association rule is often denoted by A =>B.

The eminence of association rules are normally estimated by viewing at their support and confidence of their transactions. The support s, of the association rule, processes the occurrence frequency of the pattern in the rule while the confidence c, it is the measure of the strength of implication. For a rule A=>B, the support is calculated by the portion of transactions that contains both A and B. Formulating these equations we get

**Dimensionality reduction representation** []

**Dimensionality reduction representation**

In a standard CF-based recommender system, the input data is a collection of purchased transactions of c customers on n products. It is usually represented as an c x n customer to product matrix R, such that is 1 if the ith customer has bought the jth good, and 0, otherwise.This m x n representation of the input data set as original representation. This representation, is easy and may potentially cause issues for Collaborative filtering recommender systems, such as:

**Sparsity: **Commercial recommender systems are used to exercise large product sets for example e-bay recommends products, books, clothes, various electronic gadgets, and CD now endorses music albums. In these websites, an even current consumer may have bought well under some x% of the goods assuming 1% of 2 million books is 20,000. A recommender system based on nearest or hybrid algorithms may not be able to make any product recommendations for a specific user. This is known as reduced coverage or sparsity problem.

**Scalability: **Collaborative filtering algorithms need computation that grows with both the number of customers and the number of products. With millions of consumers and goods, a typical e-commerce recommender system will have scalability problems.

**Synonymy: **Different product names can be same as or similar to the different objects. Correlation based recommender systems can't figure out this latent association and thus, think these products differently. For example, let us take two
customers one of them purchases 40 different recycled letter pad products as
and one more customer purchases 20 different recycled memo pins. Correlation based
recommender systems wouldn’t see any match between product sets to find any correlation and finally will be unable to discover the latent association that both of them like recycled product.

These improper alignments of the original data representation led to explore new methods for extracting the input data. A more classic and natural way of demonstrating sparse data sets is by computing a lower dimensional representation. Essentially, this approach takes the m x n customer-product matrix and uses a reduced single value decomposition to achieve a Rank-K estimate of the new matrix. We will refer to this as the minimized dimensional representation. This illustration has a number of benefits. First, it attenuates the sparsity problem as all the entries in the n k matrix are nonzero, which means that all n consumers now have their ideas on the k meta-products. Second, the scalability problem will be improved k << n, the processing time, and storage requirement both upgrade dramatically. Third, this reduced method will capture latent association between consumers and gods in the reduced feature space and thus can potentially remove the synonymy problem. Apart from the two- dimensional representations like high dimensional or low dimensional of input data, consider two dissimilar schemes, of regulating the customer vectors in the feature space. In the real scheme, vectors are not regulated and are kept in their new format. In the other scheme, each vector is regulated to have unit length. The inspirations behind this regularization are to develop a public framework by which to treat consumers that have procured the different number of goods.

**Case Study: Dimensionality reduction on two different datasets**[]

Two different data sets to evaluate the different algorithm techniques. The 2 different data sets are as follows:

**Movie Data: ** Movie Lens is a website where it supports the researchers working on recommender system. Every week hundreds of consumers visit Movie Lens to view, rate and receive recommend for movies. The site now has over 40000 users who have
expressed opinions on 3500+ different movies. For experiment purpose, they
selected 100000 ratings from the database considering only ratings above 20
movies. The database is divided into 80% training set and 20% test set. The
data set was transformed into a binary user- movie matrix R that had 1000 rows
and 1700 columns. The sparsity level of the Movie data set is, 0:9369. It is
named as ML.

**E-Commerce: **In addition to the above data, e-commerce purchase
data from Fingerhut, a product e-commerce company. This data set contains
purchase set of 6000 customers on 23,555 catalog products. In total, this data
set contains 97050 buying records. As before, we divided the data-set into a
train set and a test set by using the same 80%=20% train/test ratio.

**Evaluation Metrics:**To evaluate top recommendation product, we use two metrics widely used in the information retrieval (IR) association known as recall and precision. The recall and precision formulas are different from the standard IR. Our algorithms worked on the training set, and produced a set of recommendations, we call the Top-N set. The main goal is to look into the test set and match goods with our Top-N set. Products that appear in both sets are members of a special set, we call the hit set. Recall and precision are defined as follows.

These two measures are often conflicting in nature. For instance, increasing the number N tends to increase recall but decreases precision. The fact that both are critical for the quality judgment leads us to use a combination of the two. In particular, we use the standard F1 metric that gives equal weight to both of them.

**Experimental Results**[]

[1]Dimensionality reduction techniques will allow CF-based algorithms to scale large data sets and at the same time yield high-quality approvals products

#### * ***Case Study: Dimensionality reduction on two different datasets** []

**Case Study: Dimensionality reduction on two different datasets**

*intrinsic interest*and the

*temporal*context (general public interest)

*.*The users prefer items based on their intrinsic interests, which may not be accurate in many social application scenarios. For instance, when selecting a book to read or a movie to watch, the users are likely to prefer books/movies that interest them. In contrast, when choosing news to read or users to follow in a social network (e.g., Twitter), it is most likely that users will be enticed respectively by breaking news or famous users who are followed by the general public. Therefore, users’ rating behaviors on the items may not necessarily delineate users’ intrinsic interests. New models are required to better analyze user behaviors in the social media system. TCAM (Temporal context aware mixture model) simultaneously subject the topics related to the users’ intrinsic interests and the topics related to the temporal context and then integrate the influences from the two factors to subject user

*behaviors in a unified way.The productive process of user rating behaviors in figure 2 TCAM model is briefly depicted as follows. Suppose a user*

*u*choose an item

*v*in a time interval t. TCAM first tosses a coin, based on the significant probabilities of the two factors, to decide whether this behavior is influenced by the user’s personal interest or the temporal context. If it results from the impact of the user’s personal interest, TCAM selects a

*user-oriented topic*for

*u*based on the user’s intrinsic interest. TCAM selects the

*time-oriented topic*as per the general public’s interest during

*t*, which in turn generates an item

*v*.

topic. Figure 6 depicts the examples of these topics. In the time- oriented topic, the items are related to a certain event (e.g.,“Boston Marathon bombings”). The popularity of the topic experiences a sharp increment during a particular time interval (e.g., in April 2013). In the user-oriented topic, the items are about the user’s regular interest (e.g., “Pet Adoption”). The temporal distribution of the user oriented topic does not show any spike-like fluctuation. Hence, TCAM models the user-oriented topics and the time-oriented topics simultaneously. [3]

**Conclusion**[]

After all the computations and experiments we conclude that Yes, Recommender systems recommends us the right product Recommender systems is a very powerful technology for abstract additional information for a business from its customer behavior and purchase database. These recommender systems benefit consumers selecting the best merchandises from e-commerce websites. They also help the business by making more sales. These systems are swiftly becoming a fundamental tool in E-commerce website. New technologies are desired that can dramatically help in making the improve the efficiency of the scalability, sparsity and synonym problem of recommender systems. We have presented various collaborative concepts and experimentally evaluated some various algorithmic choices for CF-based recommender systems. From our research studies and experimental values we have summarized that:

•Hybrid algorithm is a very good algorithm for Movie Recommendation.

•Dimensionality Reduction Representation techniques handle huge information of data very efficiently and is recommended to be used in e-commerce site which sells Products.

•Temporal context-aware mixture model (TCAM) it unifies users intrinsic interest in temporal to analyze the user’s behavior in Social media. Our results also quantify that dimensionality reduction techniques hold the potential of CF-based algorithms to scale large data sets and at the same time produce high quality recommendations.

**Future Work**[]

We need to work on Low dimensionality representation concepts why it performs different with large and small data-sets. We need to concentrate more on filtering techniques to filter the noise in social media data. And finally aim to recommend users more efficiently and increase the scalability factor.

**References**[]

1. Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2000). Analysis of recommendation algorithms for e-commerce.Proceedings of the 2nd ACM Conference on Electronic Commerce, 158 167.

2. W. Woerndl and J. Schlichter, “Introducing context into recommender systems.” Muenchen, Germany: Technische Universitaet Muenchen, pp. 138–140.

3. Chen, C., & Zeng, D. (2012). A dynamic user adaptive combination strategy for hybrid movie recommendation.Proceedings of 2012 IEEE International Conference on Service Operations and Logistics, and Informatics, 172-176.

4. B. Sarwar, G. Karypis, J. Konstan, and J. Reidl, “Item-based collaborative filtering recommendation algorithms,” in Proceedings of the 10th international conference on World Wide Web, ser. WWW ’01. New York, NY, USA: ACM, 2001, pp. 285–295. [Online. Available: http://doi.acm.org/10.1145/371920.372071.]

5. L. AlSumait, D. Barbara, and C. Domeniconi. On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In IEEE Conf. on Data Mining, pages 993–1022, 2008.

6. Yin, H., Cui, B., Chen, L., Hu, Z., & Huang, Z. (2014). A temporal context-aware model for user behavior modeling in social media systems. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 1543-1554.