Syracuse University ยท Data Lab

๐ŸŽฅ YouTube

A multi-dimensional network consists of various type of interactions which can be used to study shared communities among heterogeneous interactions.

15K
Nodes
5.6M
Edges
369.4
Avg Degree
No
Missing
Network Statistics
15K
Total Nodes
5.6M
Total Edges
369.4
Avg Degree
Video
Category
Size Relative to Repository Maximum
Nodes
15K
Edges
5.6M
Nodes & Edges โ€” Repository Comparison
Highlighted bar = this dataset. Logarithmic scale.
Edge-to-Node Ratio
Network density indicator
Dataset Details

Source

Lei Tang, Huan Liu



Email: ,

Dataset Information

6 files are included:

1. nodes.csv

-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains all the node ids used in the dataset.

2. [1-5]-edges.csv

-- they are the csv format of interactions. Each csv file represents one type of interaction. It is composed of three columns, with the first two representing the user ids, and the last representing the intensity of interaction. Here is an example:

1,58,3

The interaction intensity between users 1 and 58 is 3.

Our network is symmetric, so we only show the interaction once. That is, 58, 1,3 will not show up if 1,58,3 is already there.

Attribute Information

This is the data set crawled on Dec, 2008 from YouTube. (http://www.youtube.com/). YouTube is a video sharing site where various interactions occur between users. In particular, we crawled 30, 522 user profiles. For each user, we crawl his/her contacts, subscriptions and favorite videos. To avoid sample selection bias, we choose authors of 100 recently uploaded videos as seed set. This crawling reaches in total 848, 003 users and 1,299,642 videos. However, not all users sharing all kinds of information. After removing those users, we have 15, 088 active user profiles.
Based on the crawled information, we construct 5 different interactions between the 15, 088 users. Specifically, they are:

1. the contact network between the 15, 088 users;

2. the number of shared friends between two users in the 848, 003 (excluding the 15,088) contacts;

3. the number of shared subscriptions between two users;

4. the number of shared subscribers between two users;

5. the number of shared favoriate videos.

Details can be found in the related reference.

Relevant Papers

Lei Tang, Xufei Wang, and Huan Liu. "Uncovering Groups via Heterogeneous Interaction Analysis", IEEE International Conference on Data Mining (ICDM09), Dec. 6-9, 2009. Miami Florida.

Lei Tang and Huan Liu. "Uncovering Cross-Dimension Group Structures in Multi-Dimensional Networks", in SDM workshop on Analysis of Dynamic Networks, 2009.
How to Cite
If you publish material based on data from this repository, please acknowledge the Data Lab Social Computing Data Repository at Syracuse University in your acknowledgements. This helps others find and replicate your work.

APA Format

R. Zafarani and H. Liu. (2026). Social Computing Data Repository [https://datasets.syr.edu]. Data Lab, Syracuse University.
@misc{Data Lab:SU,
  author       = {R. Zafarani and H. Liu},
  year         = {2026},
  title        = {Social Computing Data Repository},
  url          = {https://datasets.syr.edu},
  institution  = {Data Lab, Syracuse University}
}