Programming lesson
Mastering Network Science with R: A Summer 2025 Tutorial Inspired by ECE 232E Projects
Learn network science concepts through hands-on R examples inspired by ECE 232E summer 2025 projects. Covers Erdős–Rényi models, preferential attachment, random walks, and community detection with timely analogies.
Introduction: Why Network Science Matters in 2025
Network science is everywhere—from social media algorithms to epidemic modeling and even the architecture of large language models like GPT-5. In summer 2025, as AI apps and decentralized finance (DeFi) platforms continue to explode, understanding random networks, preferential attachment, and random walks has never been more relevant. This tutorial walks you through core concepts from the ECE 232E projects at UCLA, using R and the igraph package. You'll build intuition for Erdős–Rényi (ER) models, scale-free networks, and community detection—without copying any assignment solutions.
1. Generating Random Networks with the ER Model
1.1 Creating Undirected ER Networks
The Erdős–Rényi model is the simplest random graph: given n nodes, each pair is connected with probability p. For n = 900, you'll explore p values like 0.002, 0.006, 0.012, 0.045, and 0.1. In R, use erdos.renyi.game(n, p, type="gnp") from igraph. Plot degree distributions and identify the shape—for small p, it's Poisson (binomial approximation), but as p increases, it becomes more Gaussian. Compare empirical mean and variance to theoretical values: mean = np, variance = np(1-p).
1.2 Connectivity and Giant Components
Not all ER networks are connected. Use is.connected() to check. For each p, estimate the probability of connectivity over many realizations. Find the giant connected component (GCC) with clusters() and compute its diameter using diameter(). As p crosses the threshold p ≈ 1/n (≈0.0011 for n=900), a GCC emerges. This is analogous to the 'viral threshold' in social media trends—once enough users share content, it spreads to a large component.
1.3 Phase Transition: Sweeping p
Sweep p from 0 to a value that makes the network almost surely connected (e.g., p_max ≈ 0.02). For each p, generate 100 networks and plot normalized GCC size vs. p. Overlay the average line. You'll see a sharp transition near p ≈ 1/n where GCC emerges, and near p ≈ ln(n)/n ≈ 0.0076 where it becomes >99% of nodes. This matches theoretical predictions from percolation theory—a concept used in understanding internet resilience and epidemic spread.
1.4 Scaling with n
Fix average degree c = np = 0.5, 1, 1.15, 1.25, 1.35 and vary n from 100 to 10000. Plot expected GCC size vs. n. For c < 1, GCC size stabilizes to a constant; for c > 1, it grows linearly with n. This is like the 'network effect' in DeFi: once a critical mass of users is reached (c > 1), the platform's value scales with the user base.
2. Preferential Attachment and Scale-Free Networks
2.1 Building a Barabási–Albert Network
Use sample_pa(n=1050, m=1, directed=FALSE) to create a network where new nodes attach preferentially to high-degree nodes. This yields a power-law degree distribution, common in real networks like the web or citation graphs. Is it always connected? Yes, because each new node attaches to an existing node, so the graph remains connected—similar to how new users on a social platform always follow someone, keeping the network intact.
2.2 Community Detection and Assortativity
Apply fast greedy community detection (cluster_fast_greedy()) and compute modularity (modularity()). Assortativity measures whether nodes connect to similar others; compute it with assortativity_degree(). For n=10500, modularity may change due to network size—larger networks often have more modular structure. This mirrors how large online communities (e.g., Reddit) develop distinct subreddits over time.
2.3 Degree Distributions and Linear Regression
Plot degree distribution on log-log axes. Use lm(log(degree_dist) ~ log(degree)) to estimate the power-law exponent (slope). Typically, γ ≈ -3 for m=1. Compare n=1050 and n=10500—the slope should be similar, confirming scale-free behavior. This is like the 'rich-get-richer' phenomenon in viral TikTok trends: a few accounts dominate while most have few followers.
2.4 The Friendship Paradox
Randomly pick a node, then a random neighbor. Plot the neighbor's degree distribution on log-log axes. It will be even more skewed (steeper slope) than the original, illustrating the 'friendship paradox': your friends have more friends than you do. This effect is used in recommendation algorithms and epidemic surveillance.
2.5 Age vs. Degree
For each node, record its age (time step added) and degree. Plot degree vs. age—older nodes have higher degree due to preferential attachment. Fit a power law: degree ∝ age^{-β}. This explains why early adopters in crypto (e.g., Bitcoin) hold disproportionate influence.
2.6 Varying m
Repeat with m=2 and m=6. Larger m increases average degree, reduces power-law exponent, and increases modularity. This is analogous to adding more connections per new user—like a platform that encourages following multiple accounts upon signup.
2.7 Stub Matching vs. Preferential Attachment
Generate a network with n=1050, m=1. Extract its degree sequence and create a new network via stub matching (sample_degseq()). Compare community structure and modularity. Stub matching preserves the degree sequence but randomizes connections, often resulting in lower modularity. This highlights the role of microscopic mechanisms in shaping network structure.
3. Modified Preferential Attachment with Age Penalty
Extend the model to include age: probability ∝ (c k_i^α + a)(d l_i^β + b). With parameters m=1, α=1, β=-1, a=c=d=1, b=0, use sample_pa_age() (if available) or implement manually. Plot degree distribution—it may still be power-law but with a different exponent. Compute modularity using fast greedy. This model mimics platforms that boost new content (e.g., Instagram's algorithm favoring recent posts).
4. Random Walks on Networks
4.1 Random Walk on ER Networks
Create an ER network with n=900, p=0.015. Simulate a random walk (no teleportation) starting from a random node. Compute average distance ⟨s(t)⟩ and variance σ²(t) from the start. Plot both vs. t. Initially, distance grows as √t (diffusion), then saturates at the network diameter. This is like a rumor spreading—it quickly reaches far nodes but then covers the entire network.
4.2 Degree Distribution of Visited Nodes
Measure the degree distribution of nodes reached by the walk. Due to degree bias, high-degree nodes are visited more often—this is the 'random walk bias' used in PageRank. Compare to the graph's degree distribution: the walk's distribution is shifted toward higher degrees.
4.3 Larger Network
Repeat with n=9000. The diameter grows (≈ log(n) for ER), so saturation takes longer. The variance also increases. This scaling is crucial for designing efficient search algorithms on the web.
4.4 Random Walk on Preferential Attachment Networks
Generate a PA network with n=900, m=1. Simulate random walk and plot ⟨s(t)⟩ and σ²(t). Because PA networks have small diameter (≈ log log n), the walk saturates quickly. The degree distribution of visited nodes is even more skewed—high-degree hubs dominate. This explains why influencers are central in information diffusion on Twitter.
Conclusion
Network science provides powerful tools to understand complex systems, from social media to financial networks. By implementing these models in R, you gain practical skills for analyzing real-world data. Remember, the key is to experiment with parameters and interpret results—not to copy solutions. As you work through your ECE 232E projects, use this tutorial as a guide to build intuition and write original code.