Requirements Summary

Create a program that will read in the text le containing graph specications and write to the output le the required results after performing operations on the graph. The program should be able to create the directed graph from the input le and output the results according to the In-Degree of Centrality.

Usage

Your program must be capable of utilizing a command line argument to specify the input graph le and output le with results.

sna inputFile outputFile

Your program must ensure the user has correctly provided the required command-line arguments and display a usage statement if the provided arguments are incorrect.

Assignment Name

The assignment name for this assignment is: sna

Social Network Analysis and Common Metrics

As social networking is becoming more and more popular among people, there is a growing interest in the study of extracting information from these networks, the so-called social network analysis (SNA). The usage of social network analysis, however, is far beyond that of Facebook or Twitter. The principle of SNA can be applied to other things; such as, nding the source and ow of an infectious disease, and scheduling the optimal production-market distribution for a globalized company.

The basic structure of SNA is a graph. To evaluate the characteristics of a graph and the nodes inside, people gradually develop a set of metrics, some of them are:

Degree of Centrality

In an undirected graph, the degree centrality of a node i (denoted by CD(i)) is the node degree (number of edges), denoted by deg(i).

CD(i) = deg(i)

We can interpret this as the how connected this node is in general sense. In a directed graph, the indegree of centrality comes in two types: the in-degree centrality and the out-degree centrality. The in-degree centrality is number of edges that are coming to the node. The out-degree centrality uses the number of edges that are coming out instead.

Follow-a-bility

This made-up metric tells me who is popular among the root users friends. This is a list of unique users that are followed by the people the root user is following. Followability allows a social network to say, Here are some suggested accounts for you to follow.

Dataset Description: Twitter Accounts and their friends

You will be provided with real Twitter accounts with their friends and for each friend his/hers friends etc. By using this data build up a graph and calculate the In-Degree of Centrality for each user.

Input data format

A sample of the input data is:

github john_stewart
github microsoft
microsoft oracle

The two columns are dening the relationship between two accounts. The user in the second column is following the user in the rst column. Thus, this is a directed graph.

For an empty le, or a le that fails formatting, you can quit with an error message that prints the usage statement or some other statementthe most important part is that you dont segfault.

You can also assume that the graph we give you will be connected; starting at any node, you will be able to visit all other nodes using a breadth-rst or depth-rst search, or a variant thereof.

Task Decomposition

Read in data

The data is in the format of:

Name_1 Name_2
Name_x Name_y

For each line there will be two usernames seperated by three spaces. Each username is no longer than 15 characters and contain only alphanumeric characters(letters A-Z,a-z, numbers 0-9) with the exception of underscores.

Dene a user class representing each node in the graph. For this class there are some essential concepts that must be captured:

  • name: store a username
  • followers: other users following this account
  • following: other users that this account is following

The created user class objects need to be stored properly. Decide a way to store this list of all users from input data sets; class denitions are up to you, but the fundamental structure will reect a directed graph with nodes and edges.

Calculate the In-Degree Centrality for each node

Calculating in-degree centrality is relatively easy. Once the in-degree centrality has been calculated for each user we want to nd the most connected user in the graph (root user). This should be the user with the highest in-degree of centrality. If there is a tie, break it with alphabetical order (case-insensitive). Once this user has been found we want to nd all of the users who are within a depth of 2 from the most connected user who are not already followed by this user, and sort them by their in-degree of centrality.

Write results to output file

The program should output a list of all the users within a depth of 2 of the root user (the user with the greatest degree centraility), that are not already being followed by that user (root user). The list should be printed in descending order of the in-degree of centraility for each one of the accounts. The output written should be in the following format:

Looking for new accounts for Name (indegreeofcentrality) to follow
Name1 (degreeofcentrality)
Name2 (degreeofcentrality)
...

When there is a tie in in-degree of centrality the names of the users should be sorted alphabetically by name (case-insensitive).

Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.