Preview

«System analysis and applied information science»

Advanced search

Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation

https://doi.org/10.21122/2309-4923-2022-3-57-65

Abstract

Finding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its hierarchical cache memory on the parameters of algorithm implementation depending on the size of the graph and the size of distance matrix’s block. It proposes a technique of tuning the block-size to the given multi-core system. The technique involves profiling tools in the tuning process and allows the increase of the parallel algorithm throughput. Computational experiments carried out on a rack server equipped with two Intel Xeon E5-2620 v4 processors of 8 cores and 16 hardware threads each have convincingly shown for various graph sizes that the behavior and parameters of the hierarchical cache memory operation don’t depend on the graph size and are determined only by the distance matrix’s block size. To tune the algorithm to the target multi-core system, the preferable block size can be found once for the graph size whose in-memory matrix representation is larger than the size of cache shared among all processor’s cores. Then this blocksize can be reused on graphs of bigger size for efficient solving the all-pairs shortest path problem

About the Authors

O. N. Karasik
Belarusian National Technical University
Belarus

Karasik Oleg, Technology Lead at ISsoft Solutions (part of Coherent Solutions) in Minsk, Belarus, and PhD in Technical Science

 Minsk



A. A. Prihozhy
Belarusian National Technical University
Belarus

Anatoly Prihozhy, full professor at the Computer and system software department of Belarus national technical university, doctor of science and full professor

Minsk



References

1. Schrijver, A. On the history of the shortest path problem / A. Schrijver // Documenta Mathematica. – 2012. – Vol. 17, №. 1. – P. 155–167.

2. Anu, P. Finding All-Pairs Shortest Path for a Large-Scale Transportation Network Using Parallel Floyd-Warshall and Parallel Dijkstra Algorithms / P. Anu, M. G. (Kumar) // Journal of Computing in Civil Engineering. – 2013. – Vol. 27, №. 3. – P. 263–273.

3. Atachiants, R. Parallel Performance Problems on Shared-Memory Multicore Systems: Taxonomy and Observation / R. Atachiants, G. Doherty, D. Gregg // IEEE Transactions on Software Engineering. – 2016. – Vol. 42, №. 8. – P. 764–785.

4. Zheng, Y. Performance evaluation of exclusive cache hierarchies / Y. Zheng, B. T. Davis, M. Jordan. – 2004. – P. 89–96.

5. Prihozhy, A.A. Investigation of methods for implementing multithreaded applications on multicore systems / A.A. Prihozhy, O.N. Karasik // Informatization of education. – 2014. – No. 1. – P. 43–62.

6. Prihozhy, A.A. Cooperative model for optimization of execution of threads on multi-core system / A.A. Prihozhy, O.N. Karasik // System analysis and applied information science. – 2014. – No. 4. – P. 13–20.

7. Park, J. Optimizing graph algorithms for improved cache performance / J. Park, M. Penner, V. K. Prasanna // IEEE Transactions on Parallel and Distributed Systems. – 2004. – Vol. 15, №. 9. – P. 769–782.

8. Floyd, R. W. Algorithm 97: Shortest Path / R. W. Floyd // Communications of the ACM. – 1962. – Vol. 5, №. 6. – P. 345-.

9. Venkataraman, G. A Blocked All-Pairs Shortest Paths Algorithm / G. Venkataraman, S. Sahni, S. Mukhopadhyaya // Journal of Experimental Algorithmics (JEA). – 2003. – Vol. 8. – P. 857–874.

10. Albalwi, E. Task Level Parallelization of All Pair Shortest Path Algorithm in OpenMP 3.0 / E. Albalwi, P. Thulasiraman, R. Thulasiram // Advances in Computer Science and Engineering (CSE 2013), Los Angeles. – Los Angeles: Atlantis Press, 2013. – P. 109–112.

11. Tang, P. Rapid development of parallel blocked all-pairs shortest paths code for multi-core computers / P. Tang // IEEE SOUTHEASTCON 2014, Lexington, KY, USA. – Lexington, KY, USA: IEEE, 2014. – P. 1–7.

12. Singh, A. Performance Analysis of Floyd Warshall Algorithm vs Rectangular Algorithm / A. Singh, P. K. Mishra // International Journal of Computer Applications. – 2014. – Vol. 107, №. 16. – P. 23–27.

13. An Experimental Study of a Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances / K. Madduri [et al.] // 2007 Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX). – Society for Industrial and Applied Mathematics, 2007. – P. 23–35.

14. Karasik, O.N. Cooperative multi-threaded scheduler and block-parallel algorithms of solving tasks on multi-core systems / O.N. Karasik. – Belarusian state university of informatics and radio-electronics, 2019.

15. Karasik, O.N. Threaded block-parallel algorithm for finding the shortest pats on graph / O.N. Karasik, A.A. Prihozhy // Doklady BGUIR. – 2018. – No. 2. – P. 77–84.

16. Prihozhy, A.A. Cooperative block-parallel algorithms for task execution on multi-core system / A.A. Prihozhy, O.N. Karasik // System analysis and applied information science. – 2015. – No. 2. – P. 10–18.

17. Prihozhy, A.A. Simulation of direct mapped, k-way and fully associative cache on all pairs shortest paths algorithms / A.A. Prihozhy // System analysis and applied information science. – 2019. – No. 4. – P. 10–18.

18. Prihozhy, А.A. Inference of shortest path algorithms with spatial and temporal locality for Big Data processing / A.A. Prihozhy, O.N. Karasik // Big Data and Advanced Analytics: Proc. VIII Intern. Conf., Minsk, May 11-12, 2022. – Minsk: Bestprint, 2022. – P. 56–66.

19. Intel Corporation. Allow Multiple Runs or Multiplex Events [Electronic resource]. – Mode of access: URL: https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance/hw-event-based-sampling-collection/allow-multiple-runs-or-multiplex-events.html. – Date of access: 07.03.2022.

20. Intel Corporation. Hardware Event-based Sampling Collection [Electronic resource]. – Mode of access: URL: https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance/hw-event-based-sampling-collection.html – Date of access: 07.03.2022).


Review

For citations:


Karasik O.N., Prihozhy A.A. Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation. «System analysis and applied information science». 2022;(3):57-65. https://doi.org/10.21122/2309-4923-2022-3-57-65

Views: 371


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2309-4923 (Print)
ISSN 2414-0481 (Online)