Libra is a conceptual Pregel-clone system that optimizes for large graph processing based on the graph structure. It is previously known as “Mizan: Optimizing Graph Mining in Large Parallel Systems” but we changed Mizan to Libra to differentiate between our Pregel-clones. Libra is a vertex centric BSP graph processor that is compatible with Pregel’s API, written in both Java and C++ and depends on MPI for communication. We don’t share the code of Libra’s because it is a conceptional system and not for production, however we will be integrating the concepts of Libra into Mizan in the near future.
Libra is not yet published work, still under review, but we provide below the abstract of Libra, Libra’s components and some use cases for Libra.
Extracting information from graphs, from finding shortest paths to complex graph mining, is essential for many applications. Due to the shear size of modern graphs (e.g., social networks), processing must be done on large parallel computing infrastructures (e.g., the cloud). Earlier approaches relied on the MapReduce framework, which was proved inadequate for graph algorithms. More recently, the message passing model (e.g., Pregel) has emerged. Although the Pregel model has many advantages, it is agnostic to the graph properties and the architecture of the underlying computing infrastructure, often leading to suboptimal performance.
In this paper, we propose Libra, a layer between the users’ code and the computing infrastructure. Libra considers the structure of the input graph and the architecture of the infrastructure in order to: (i) decide whether it is beneficial to generate a near-optimal partitioning of the graph in a pre-processing step, and (ii) choose between typical point-to-point message passing and a novel approach that puts computing nodes in a virtual overlay ring. We deployed Libra on a small local Linux cluster, on the cloud (256 virtual machines in Amazon EC2), and on an IBM Blue Gene/P supercomputer (512 CPUs). We show that Libra executes common algorithms on very large graphs up to one order of magnitude faster than implementations relying on Pregel-like hash-based graph partitioning.
Libra is an optimization layer between the implementation of the graph processing algorithm and the physical computing infrastructure. It utilizes two different computation models, Libra-α and Libra-γ, to achieve an optimized massage delivery in the bulk synchronous graph processing system. Libra-α is more suitable for graphs that are roughly power-law, whereas Libra-γ is more appropriate for other graph. As a result, Libra alternates between Libra-α and Libra-γ according to the input graph structure.
Libra uses curve fitting to detect if the input graph has a power-law edge distribution or not. If the input graph follows a power-law distribution, we use Libra-α to process the input graph, which a min-cuts partitioning used to partition the input graph and point-to-point communication is used between the workers. If the input graph does not follow a power-law distribution, on the other hand, we use Libra-γ to process the input graph. Libra-γ uses hash-based partitioning to quickly partition the input graph and utilizes ring overlay communication between the workers to deliver the messages.