BIG 2016 CUP

Academic data is one of the most important resources for researchers all over the world. Despite significant efforts by the builders of academic databases, errors still abound. A very common one is the incorrect identification of authors. In other words, papers are frequently mistakenly associated with incorrect authors.

Microsoft provides a Microsoft Academic Graph data set and an online graph query interface for BIG 2016 CUP. Participants are supposed to provide a REST service endpoint that can verify whether a paper is written by a given author. Innovative heterogeneous approaches combining machine learning and graph computation techniques are encouraged.


Microsoft provides the Microsoft Academic Graph (MAG) data set (2015-11-06). MAG is a large heterogeneous graph containing scientific publication records, citation relationships between publications, as well as authors, institutions, journal and conference "venues," and fields of study. This data is available as a set of zipped tab-separated values files stored in Microsoft Azure blob storage and available via HTTP. The zipped file size is ~37GB. The decompressed data size is ~92.3 GB. Before downloading the data set, participants must read and accept the terms of use for the Microsoft Academic Graph.

The schema of MAG is also provided to help participants understand the graph. The paper rank score reflects the quantized natural log of the importance of a paper based on the number and quality of papers citing it.

Besides MAG, participants are welcome to use external data in their approaches as long as that data is publically accessible.

Evaluation Metric

The test cases are not available before evaluation. When the evaluation starts, the evaluator system pushes test cases to the REST endpoint of each team individually. The evaluation process consists of two phases.

Phase 1: Final Evaluation. Each team will receive a set of test cases. The overall accuracy is recorded as P1 score.

Phase 2: Additional Test. This test is only for the teams that get the same P1 score. If team A and team B receive the same P1 score, then the evaluator system will keep sending test cases to A and B until the tie is broken. The score from Phase 2 is called P2 score.

The final rank is according to P1 score. For the teams with the same P1 score, then they are ordered by P2 score.

The Phase 1 test lasts 3 days. The Phase 2 test lasts 1 day. Before Phase 1, there will be a 2-day time window for a test drive. During test drive, the evaluator will send test requests to the REST endpoint of each team. The test drive score will not be counted in the final rank.

Microsoft Academic Graph Query API

We provide a graph query interface for querying Microsoft Academic Graph via graph exploration. The graph query interface powered by Graph Engine allows us to not only query entities that meet certain criteria (e.g. find a paper with a given title), but also perform pattern matching via graph exploration (e.g. detect co-authorship).

The query interface is provided as a RESTful endpoint. The input and output of a query are encoded as JSON objects. More details can be found here.

REST Service Endpoint

For each test case, the REST service endpoint will receive an HTTP request that consists of a pair of paper_id and author_id in JSON format, where the ids are integers, e.g. {"paper_id": 123, "author_id": 456}. The service endpoint needs to respond with either { "result" : "yes" } or { "result" : "no" } within 90 seconds. If the given paper_id and author_id represent a correct paper-author pair, { "result" : "yes"} is the correct answer; otherwise, { "result" : "no"} is the correct answer. After receiving the response, the evaluator will wait for a random period of time before sending the next requests.


The challenge is a team-based contest. Each team can have one or more members, and an individual can be a member of multiple teams. No two teams, however, can have more than 1/2 shared members.

Each team must register during the registration time window.


Wei-Ying Ma, Microsoft
Kuansan Wang, Microsoft
Bin Shao, Microsoft
Yatao Li, Microsoft