Prediction Future Location for Multiple account users Across Social Network

The personal data of people over social networks bring many opportunities for developing a new application. Some users create multiple accounts on social networks. Correlate user information across social networks helps to design algorithms to predict the future location of multiple accounts of a single individual. Most social media site applications need to share their sites with friends and family and this feature has made many people need to predict the site. Prediction is one of the foremost essential issues that need to be investigated for mobility management In mobile computing systems. In this paper, a proposed algorithm for future location prediction of multiple account users based on using frequent pattern mining. First, apply the frequency algorithm. The proposed algorithm experimented using "Gowalla and foursquare datasets" were presented simple method can be used to infer the user location using publicly available attributes and also the geographic information associated with allocatable friends. We find that it is possible to infer the user city with high accuracy. The proposed algorithm prediction achieved accuracy 96% in the Gowalla dataset when applying the support and confidence for each region that visited the user in a specific time period through some years


. Introduction
Today became most people use the Social Media Network (SMN) [1].There are several applications of the social network; these applications are "Facebook, Twitter, Instagram" and more others.Users use several social network applications for the purpose that needed in the life such as use Facebook to keep communication with a friend and family members.And use Twitter to post news,Foursquare for locationbased social applications.And use LinkedIn for job search [2].For this reason many of users register in multiple social networks.The problem of multiple accounts of users is belive the person sends an image or video to another user.But this user is the same username because they have multiple accounts but in along time.this person discovers that the users have multiple accounts are same and don't another user.The benefits of multiple accounts the user that registers in the signal social network the information is incomplete but the user that registers in multiple social networks the information of profile is complete to improve online services for example information diffusion and business intelligence [3].Prediction is similar to the classification, except that we are trying to predict the value of a numerical variable (e.g., amount of purchase): rather than a class (e.g., the purchaser or No purchaser) [4].prediction is an important task for analyzing social networks.Much of the research in data mining social networks, is focused on using the association in order to derive interesting data about the social media network such as communities, or labeling the nodes with class labels.For example, in a social network, friend links are continuously created over time.Therefore, a natural question is to determine or predict future links in the social network.The prediction process may use either the structure of the network or the attribute-information in the different modes [5].
The organization of this work follows the related work explain in section2 and the basic definition of this technique that used in this paper explains in section3 and section4 display the structure of the proposed method finally section5 presented the results and experimental for this method.

Related work
Recent research on prediction user location across the Social Media Network (SMN) using the techniques of data mining and machine learning.
1. Salvatore et.al. [6] (2011) defined new features for prediction depend on the properties of places visited by the user.They study the problem of designing a link prediction system for locationbased social networks and describe a supervised learning framework that discovers the prediction features to predicate new associates among friends -of -friend and place of a friend.This work used the "Gowalla dataset" where 30% of new links among "place-friend" such as among users who visit the same places and show how this prediction 15 times smaller while still 66% thus proposed new prediction features.These results displayed a new position in a realworld link system on LBSN. 2. Eunjoon et.al [7] (2011) developed a model of human movement in location-based social network through the movement of user and mobility patterns become have a high degree of variation and freedom where the exhibit structure of patterns to geographic and social constraint by using cell phone location information.In this work used three large of the dataset, two of them from two online location-based social networks and one from social network applications.These datasets are: "Facebook, foursquare, Gowalla".in this research aim to understand human motion and dynamics and explain 10% to 30% of all human movement while 50% to 70% to explain periodic behavior.3. Tatiana et.al [8] (2012) proposed a method to infer the home located in a social media network that becoming a large volume of personal information available on the social media network such as (professional associations).This study executes the large scale study in three popular social networks: google+, foursquare, Twitter where this work focused on inferring the user's home location that considers private attribute and also geographic data links with locatable friends.This approach achieved a high accuracy of around 67% for foursquare and 72% for Google+ and 82% for Twitter.4. Ole.J. et.al [9] (2013) presented in this work test link prediction between two types of the dataset in the mobile social network from location-based social network and records of reality mining.location-based social network includes the Gowalla and Brightkite dataset and another type of these datasets called( Call Data Record CDR) and Gowalla dataset called( Location Data Record LDR) where contain location information.These features used for prediction.In this work apply different techniques on the dataset where these techniques are: decision tree, support vector machine(SVM), Naïve Bayes, logistic Regression.The experimental results of this work achieved higher accuracy both (reality mining and Gowalla dataset) in the "decision tree, logistic Regression" classifier: the maximum dataset size is 7,902 but the minimum dataset size is 349.The result of decision tree precision (0.99), recall is (0.96 ) and f-the measure is (0.97).
5. Shamila et.al.[10](2014) presented survey for frequency pattern mining to find links of patterns for data streams where in this work discover how this algorithms used to get frequent pattern (FP) of large transactional databases where algorithms these used in this work are: " Priory algorithm, Frequent Pattern (FP) Growth algorithm, Rapid Association Rule Mining (RARM), ECLAT algorithm and Associated Sensor Pattern Mining of Data Stream (ASPMS) frequent pattern mining algorithms".And explain the strengths and weaknesses of each algorithm in this study also compare these algorithms using the same database to understand the proprieties of these algorithms.

Basic Definitions
3.1 Association rule [11] Association is typically used for locating tendencies within the information.That is, this technique tries to find groups of items that are commonly found together.The results can be used to determine which items are common ought together.Association rule associations, patterns among items in large databases.The rules can then be used in a variety of ways.[4] Frequent item can be used to create association rules, with the use of a degree known as the "confidence" [12] Support:-The support of a rule A ⇒ B (where A and B are each items/events etc.) is defined as the proportion of transactions in the data set which contain the item set A as well as B.
Support (A ⇒ B) = no. of transactions which contain the item set A & B / total no. of transactions…………….
(1) [11] Confidence:-Let A and B be two groups of items.The confidence conf(A ∪ B ) of the rule A ∪ B is the conditional likelihood of A ∪ B happening in a transaction, given that the transaction includes A. Therefore, the confidence conf(A ⇒ B )is defined as follows:

Apriori algorithm[11]
The Priory algorithm starts by calculating the supports of the specific items to create frequent 1-itemsets.The 1-itemsets are collective to generate candidate 2-itemsets, whose support is calculated.The frequent 2-itemsets are reserved.In general, the frequent item groups of length k are used to create the candidates of length (k + 1) for increasing values of k.Algorithm (1) represents the general steps of the apriori algorithm.
A support value is providing to the algorithm.First, the algorithm creates a list of contender itemsets, which contains all of the itemsets appearing within the dataset.Of the contender itemsets created, an itemset can be determined to be frequent if the number of transactions that it appears in is greater than the support value.

Frequency Pattern[13]
Algorithm Frequent Pattern Mining (FPM) is one of the maximum intensively investigated problems in terms of computational and algorithmic development.
During the primary few a long time of inquiring about in this region, the essential center of work was to discover FPM calculations with superior computational effectiveness.It to begin with populates all lengthone visit designs in a visit design data-store, FP.At that point, it creates a candidate design and computes its back within the database.In the event that the bolster of the candidate design is rise to or higher than the least support threshold the design is put away in FP.The method proceeds until all the visit designs from the database are found.

3.
Dataset [14] In this study used two dataset for location based social network (LBSN) are:gowalla dataset and foursquare dataset.Gowalla dataset provides API directly access publically available data.Each check-in in this dataset contains user identification and location identifier and timestamp and GPS coordinates that refers to venues itself and consider more shared among many users and lower identifiable.This thesis used the "Gowalla dataset" that consists of (35) attributes and (1000) state.

B. Foursquare Dataset
It is a location-based social network dataset.The foursquare dataset is one and widely and popular dataset contain 30-million users and 3 billion check-ins.it is collected by Gao et.al. it is a collection of 2,073,740 -check-in and 18,107 users from 43,063 -a location from August 2010 to November 2011.In the foursquare dataset the social ties collected directly from this dataset also this dataset contain a user identifier and place identifier and timestamp and GPS organizes where the information of location in this dataset is sense much more unique than others dataset.This means less unique pieces of information are more shared among users that help when exploited to identify users where GPS location of user performed more uniques than GPS coordinate of the venue itself.The foursquare dataset consists of (273) states.

Proposed Method (Prediction for future location)
In this section a proposed algorithm to predicte for future location based on Association rule and Apriori algorithm Frequency Pattern will be described.

Frequent Pattern Mining (FPM)
The first step in this part Frequent Pattern Mining that consider an analytical operation for location that finds frequent patterns, associations from data sets found in various kinds of databases such as relational databases, transactional databases, this operation aims to find the redundancy of item (region) through specific period that enables us to predict the occurrence of a specific item based on the occurrence of other items in the transaction.Frequent patterns are patterns which appear frequently of the region that users are visited in a dataset.A frequent itemset is one which is made up of one from these forms.The steps of (Frequent Pattern Mining (FPM)):

1.
Determine the specific time period that users are visited.

2.
Compute the frequency for each region.

3.
Sorted the frequently from higher to lower frequencies.algorithm (2) display the general steps of prediction for the future location where the input of this algorithm is a dataset (A or B) and the username that wants to know his future location the step1 take variable (matches) default value for matches variable is "0".In this variable put the probability of region that visited username in specific times.instep 2 using for loop to apply all record in dataset(A) after then in step3 checking if username equal to the record in dataset if yes then select the city name for this username if no go-to for loop to make another record after then determine the specific time and in step5 compute the frequent region for this username and return the output sorted( matches )and max(matches) that represent frequent pattern of region and predict region.
Figure (1) represents the structure of a proposed system that represent same e steps of the algorithm (2) where start with a request from user enter username and search in the dataset after then determine the specific time and compute the frequent pattern of region, predict region as the output of the proposed system.Step 9: return output, max For example to execute the proposed algorithm(2) on gowalla dataset for time period [2009][2010] to user.Table (1) and Figure (2) display the frequency pattern for cityname = "New York".where observed that "New York" region represent a higher redundancy in 2009 and 2010 therefore this region consider item in this method for both instances.The frequency pattern is found then find the support and confidence is computed for each region by applying the a priori algorithm where take min support (0.1,0.2,0.3,0.4….Etc.).

Algorithm
Where in 2010 the support and confidence are: The support of: San Juan: 0.

Implementation Experimental Results
this system to discover future location for the suspect users to follow the location during specific time period to discover cuurent location where used in this tools two algorithm (association rule and frequency patteren).Firstly apply frequency patteren algorithm and obtained the sequential pattern of location through specific time period after then apply association rule algorithm to know the location that suspect user visited continuously and know the support and confidence for each region visited from the user suspectd.In Figure (3) Result of predication for future location.

Figure (3) Result of predication for future location
For example: The user that username "Fred Wilson" visited the region in 2010" are: In table (2) observed the support and confidence for each region that visited suspect users in 2010 and 2009 where higher confidence in " New York" region in 2010 was 87% with support 8.0 an 95% in 2009 with support 4.2 where found the confidence just for higher support from the value of confidence using to predicate the future location for suspect users.

Conclusion
In this paper, the proposed method by represented two algorithm (Frequency Pattern and Association rule) where used Frequency Pattern mining algorithm to find the series of region sequentially in specific times.And used Association rule to find the support and Confidence for each region that visited by user who want discover his future location and from apply two algorithm to find the higher probability region .theaccuracy of proposed method different from one user to another where the higher accuracy is 96% in gowalla dataset.

( 2 )
: Prediction for Future Location.Input: Multiple account of user, time period.Output: Frequent pattern of region, predict region Step1: initialization matches = 0, List=[ ] // list to put all region Step2: for each account in multiple user identification Get (the specific time period) Step 3: select city name from dataset A Step 4:List.append (city name) Step 5 if the List [matches] =List [matches+1] then Matches = matches +1 Step 6: output =sorted( List) Step 7: for i 1 to length (list) Compute support List [i] using equation(1) Compute confidence List [i] using equation(2) Step 8: max= max(support), max (confidence)

Figure ( 1 )Find
Figure (1) the structure of proposed system

3
The support of: Philadelphia: 0.1 The support of: Newark: 0.1 The support of: New York: 3.0 The support of: Flushing: 0.1 The support of: Queens: 0.2 The confidence of: San Juan: 50 % The confidence of: San juan: 50 % The confidence of: New York: 95 % The confidence of: Queens: 50 % And in 2009 the support and confidence are: The support of: Boulder: 0.8 The support of: Brooklyn: 0.2 The support of: New York: 4.2 The support of: Amagansett: 0.4 The support of: Queens: 0.4 The confidence of: Boulder: 25 % The confidence of: Brooklyn: 100 % The confidence of: New York: 95 % Dataset is a location-based social network dataset.It is available where the user shared a location by check-in with a friend the classes of problems defines a location and social relations of users are existing this dataset is discontinued.Gowalla dataset collected by cho.et.al, it is includes 6,442,890 /check-in and 196,591 /users from 1,280,969 locations defined from February 2009 -October 2010.

Table ( 2) Support and Confidence of region
Brooklyn: 1, New York: 21, Amagansett: 2, Queens: 2] observed the higher number of regions that visited is ('New York': 21).From these result from 2009 and 2010 predicate the suspect user found in the New York region.