samedi 22 septembre 2012

Networked life

Networked Life - University of Pennsylvania - 


Michael Kearns - Professor of Computer and Information Science

Notes of the course

Made by : Nouri Hamza - Tunisia

21/ 09/2012

N.B : If you find an error , please report it so i can correct it, thanks  :-)

Don't be shy, add your comments and impressions, it will be helpful for me ;-)


I/ Introduction to Networks

What is a Network ?

A network (or graph) is :
* A collection of individuals or entities, each called a vertex or node.
* A list of pairs of vertices that are neighbors, representing edges or links.

* N : Number of Vertices in a network
* Number of possible edges =  N(N-2) 

* The degree of a vertex is its number of neighbors

* The distance between two vertices is the length of the shortest path connecting them
* The diameter of a Network is the average distance between pairs
* There is two types of Network :
   - Undirected networks : Relationship is symetric ,e.g Facebook friendship.
   - Directed networks : e.g Following on Twitter

Erdros Number Project

* It studies research collabaoration among mathematicians.
* There is another example of a research : Last Name Expirement 

II/ Navigation and contagion Networks

Navigation in (social) Networks

Small World Experiment (1969) :
* An arbitrary "target person" and a group of "starting persons" were selected, and an attempt was made to generate an acquitance chain for each starter to the target.
269 Starting persons
64 : completed chain 

* Many of the completed chain passed through a small number of persons (connectors to the target) (hubs of social traffic)

Small worlds : A modern experiment (2003) :
* 18 Targets from 13 countries
* <5% : through any penultimate indivuals
* no evidence of connectors
* interesting navigation of navigation

Contagion and tipping Network

Contagion : Metaphor of viral spread (diffusion, prcolation)
Tipping : sudden, massive spread of contagion (threshold, critical phenomena) e.g
*There is a special value g(value of the thresold)


III/ How do real Network look?

Heavy tails Degree Distribution

e.g of heavy tails degree distribution

Mathematical mode of a typical "bell shaped" distribution :
- Normal or gaussian distribution
- _u : mean/ avrage

* Power law distribution with exponenet _B

we plot log(x) over log(f(x))  -----> A line

Small diameter

* Assume network has a single connected component
* For every pair of vertices U and V , compute shortest path distance d(U,V)

Clustering of connectivity

The clustering coefficient of a network :

Clustering coefficient of a graph G :

Edge density of a Network G :

High clustering  :   CC(G) >>> P

IV / Models of Network formation

The Erdos Renyi Model

 * Formation of the Erdos-Renyi Model :
 1/ Begin wit N isolated vertices
 2/ Add edges gradually (radom undiased )

* In Erdos Renyi model, as N goes larger :
   -  If p < 1/ N  :  Probability of a giant component goes to 0
  -   If p > 1/ N  :  Probability of a giant component goes to 1, and all other components will have size at     
                            most log(N)

* The threshold for diameter G

* Any monotone proprety of Networks exhibits a threshold phenomen in Erdos-Renyi

Clustering Model

* Real World :
  - Big clustering
  - low diameter

Preferantial attachment

* Rich get richer Process :
  - if you have an amount x of something, the probability you get more is propotional to x
  => leads to heavy-tailed distribution

* Preferantial Attrachment :
   1/ Start with two vertices connected by an edge 
   2/ At each step, add one new vertex N with one edge back to previous vertices
   3/ Probability a previously added vertex V receives the new edge from v is propotional to the (current) 
       degree of V
 => Power law distribution


V/ Navigation revisited and the page rank algorithm

Navigation in Network

In order for people(or machines) to find short paths in Network :
* Short paths must exist(structural : small diameter)
* Algorithmic

Kleinsberg's Model :
* Each vertex connected to compass neighbors
* Add a few random "long distance" connections to each vertex
* Propbability p(d) of connecting to a vertex at grid distance d:
Longer r: heavy bias towards "more local" long-distance connection
small r: approach uniformaly random

* if r is too large (strong local bias)
   Short paths may not even exsit
* If r is too small(no local bias)

* Effective search requires a delicate mixture of grid distance (r=2)

Important vertices and the page rank algorithm

* Some purely structural definitions of importance :
- High degree
- Link between communities
- Betweeness centrality
- Google's PageRank Algorithm

* Page Rank idea : Use the link structure of the web

* Directed Networks:
- Each Web/site has "in-links" and "out-links" 
   in-degree : Number of in-links 
  Out-degree : Number of out-links
 -Page rank answer : an important page is pointed to by lots of other important pages

*The page Rank Algorithm (circular)
  1/  . q->p
       . q "distributes" its rank over its out-links
       . p rank :

 2/ Turn the equations into an update rule (algorithmà

* Under broad conditions, this algorithm will converge (updates no longer change any values)


Aucun commentaire:

Enregistrer un commentaire