HEP ML LAB: An end-to-end framework for applying machine learning to phenomenology studies

  • Recent years have seen the development and growth of machine learning in high-energy physics. However, additional effort is required to continue exploring the use of machine learning to its full potential. To simplify the application of the existing algorithms and neural networks and to advance the reproducibility of the analysis, we developed HEP ML LAB ( $ \mathrm{hml}$ ), a Python-based, end-to-end framework for phenomenology studies. It covers the complete workflow from event generation to performance evaluation, and provides a consistent style of use for different approaches. We propose an observable naming convention to streamline the data extraction and conversion processes. In the KERAS style, we provide the traditional cut-and-count and boosted decision trees together with neural networks. We take the $W^+ $ tagging as an example and evaluate all built-in approaches with the metrics of significance and background rejection. With its modular design, HEP ML LAB is easy to extend and customize, and can be used as a tool for both beginners and experienced researchers.
  • 加载中
  • [1] J. Cogan, M. Kagan, E. Strausset al., JHEP02, 118 (2015) doi:10.1007/JHEP02(2015)118
    [2] L. G. Almeida, M. Backović, M. Clicheet al., JHEP07, 086 (2015) doi:10.1007/JHEP07(2015)086
    [3] L. de Oliveira, M. Kagan, L. Mackeyet al., JHEP07, 069 (2016) doi:10.1007/JHEP07(2016)069
    [4] P. Baldi, K. Cranmer, T. Faucettet al., Eur. Phys. J. C76(5), 235 (2016) doi:10.1140/epjc/s10052-016-4099-4
    [5] P. T. Komiske, E. M. Metodiev, and M. D. Schwartz, JHEP01, 110 (2017) doi:10.1007/JHEP01(2017)110
    [6] G. Kasieczka, T. Plehn, M. Russellet al., JHEP05, 006 (2017) doi:10.1007/JHEP05(2017)006
    [7] L. M. Dery, B. Nachman, F. Rubboet al., JHEP05, 145 (2017) doi:10.1007/JHEP05(2017)145
    [8] G. Louppe, K. Cho, C. Becotet al., JHEP01, 057 (2019) doi:10.1007/JHEP01(2019)057
    [9] A. Butter, G. Kasieczka, T. Plehnet al., SciPost Phys.5(3), 028 (2018) doi:10.21468/SciPostPhys.5.3.028
    [10] E. M. Metodiev, B. Nachman, and J. Thaler, JHEP10, 174 (2017) doi:10.1007/JHEP10(2017)174
    [11] J. A. Aguilar-Saavedra, J. H. Collins, and R. K. Mishra, JHEP11, 163 (2017) doi:10.1007/JHEP11(2017)163
    [12] L. Moore, K. Nordström, S. Varmaet al., SciPost Phys.7(3), 036 (2019) doi:10.21468/SciPostPhys.7.3.036
    [13] T. Heimel, G. Kasieczka, T. Plehnet al., SciPost Phys.6(3), 030 (2019) doi:10.21468/SciPostPhys.6.3.030
    [14] P. T. Komiske, E. M. Metodiev, and J. Thaler, JHEP01, 121 (2019) doi:10.1007/JHEP01(2019)121
    [15] H. Qu and L. Gouskos, Phys. Rev. D101(5), 056019 (2020) doi:10.1103/PhysRevD.101.056019
    [16] A. Butteret al., SciPost Phys.7, 014 (2019) doi:10.21468/SciPostPhys.7.1.014
    [17] E. A. Moreno, O. Cerri, J. M. Duarteet al., Eur. Phys. J. C80(1), 58 (2020) doi:10.1140/epjc/s10052-020-7608-4
    [18] Y. C. J. Chen, C. W. Chiang, G. Cottinet al., Phys. Rev. D101(5), 053001 (2020) doi:10.1103/PhysRevD.101.053001
    [19] V. Mikuni and F. Canelli, Eur. Phys. J. Plus135(6), 463 (2020) doi:10.1140/epjp/s13360-020-00497-3
    [20] J. S. H. Lee, I. Park, I. J. Watsonet al., J. Korean Phys. Soc.84, 427 (2024) doi:10.1007/s40042-024-01037-3
    [21] F. A. Dreyer and H. Qu, JHEP03, 052 (2021) doi:10.1007/JHEP03(2021)052
    [22] L. Anzalone, T. Diotalevi, and D. Bonacorsi, (2022). DOI:10.1088/2632-2153/ac917c
    [23] S. K. Choi, J. Li, C. Zhanget al., Phys. Rev. D108(11), 116002 (2023) doi:10.1103/PhysRevD.108.116002
    [24] A. Elwood, D. Krücker, and M. Shchedrolosiev, J. Phys. Conf. Ser.1525, 012110 (2020) doi:10.1088/1742-6596/1525/1/012110
    [25] P. Baldi, P. Sadowski, and D. Whiteson, Nature Commun.5, 4308 (2014) doi:10.1038/ncomms5308
    [26] A. Aurisano, A. Radovic, D. Roccoet al., JINST11(09), P09001 (2016) doi:10.1088/1748-0221/11/09/P09001
    [27] W. Bhimji, S. A. Farrell, T. Kurthet al., J. Phys. Conf. Ser.1085(4), 042034 (2018) doi:10.1088/1742-6596/1085/4/042034
    [28] P. Abratenkoet al., Phys. Rev. D103(9), 092003 (2021) doi:10.1103/PhysRevD.103.092003
    [29] J. Li, T. Li, and F. Z. Xu, JHEP04, 156 (2021) doi:10.1007/JHEP04(2021)156
    [30] Y. Zhu, H. Liang, Y. Wanget al., Eur. Phys. J. C84(2), 152 (2024) doi:10.1140/epjc/s10052-024-12475-5
    [31] E. Buhmann, C. Ewen, G. Kasieczkaet al., Phys. Rev. D109(5), 055015 (2024) doi:10.1103/PhysRevD.109.055015
    [32] S. Song, J. Chen, J. Liuet al., JINST19(04), P04033 (2024) doi:10.1088/1748-0221/19/04/P04033
    [33] C. L. Cheng, G. Singh, and B. Nachman,Incorporating Physical Priors into Weakly-Supervised Anomaly Detection, (2024), arXiv: 2405.08889
    [34] C. Liet al.,Accelerating Resonance Searches via Signature-Oriented Pre-training, (2024), arXiv: 2405.12972
    [35] L. de Oliveira, M. Paganini, and B. Nachman, Comput. Softw. Big Sci.1(1), 4 (2017) doi:10.1007/s41781-017-0004-6
    [36] M. Paganini, L. de Oliveira, and B. Nachman, Phys. Rev. Lett.120(4), 042003 (2018) doi:10.1103/PhysRevLett.120.042003
    [37] M. Paganini, L. de Oliveira, and B. Nachman, Phys. Rev. D97(1), 014021 (2018) doi:10.1103/PhysRevD.97.014021
    [38] P. Baldi, L. Blecher, A. Butteret al., SciPost Phys.13(3), 064 (2022) doi:10.21468/SciPostPhys.13.3.064
    [39] C. Jiang, S. Qian, and H. Qu, SciPost Phys.18, 195 (2025) doi:10.21468/SciPostPhys.18.6.195
    [40] D. Kobylianskii, N. Soybelman, E. Dreyeret al., Phys. Rev. D110, 072003 (2024) doi:10.1103/PhysRevD.110.072003
    [41] M. Feickert and B. Nachman,A Living Review of Machine Learning for Particle Physics, (2021), arXiv: 2102.02770
    [42] J. Alwall, R. Frederix, S. Frixioneet al., JHEP07, 079 (2014) doi:10.1007/JHEP07(2014)079
    [43] T. Sjöstrand, S. Ask, J.R. Christiansenet al., Comput. Phys. Commun.191, 159 (2015) doi:10.1016/j.cpc.2015.01.024
    [44] J. de Favereau, C. Delaere, P. Deminet al., JHEP02, 057 (2014) doi:10.1007/JHEP02(2014)057
    [45] R. Brun, F. Rademakers, P. Canalet al., root-project/root: v6.18/02, (2020), DOI: https://doi.org/10.5281/zenodo. 3895860
    [46] J. Ansel, E. Yang, H. Heet al., in29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS ’24) (ACM, 2024). DOI: 10.1145/3620665.3640366. URL https://pytorch.org/assets/pytorch2-2.pdf
    [47] M. Abadi, A. Agarwal, P. Barhamet al.,TensorFlow: Large-scale machine learning on heterogeneous systems, (2015). URL https://www.tensorflow.org/. Software available from tensorflow.org
    [48] L. Benatoet al., Comput. Softw. Big Sci.6(1), 9 (2022) doi:10.1007/s41781-022-00082-6
    [49] Y. C. Guo, F. Feng, A. Diet al., Comput. Phys. Commun.294, 108957 (2024) doi:10.1016/j.cpc.2023.108957
    [50] J. Brehmer, F. Kling, I. Espejoet al., Comput. Softw. Big Sci.4(1), 3 (2020) doi:10.1007/s41781-020-0035-2
    [51] J. Brehmer, K. Cranmer, I. Espejoet al., J. Phys. Conf. Ser.1525(1), 012022 (2020) doi:10.1088/1742-6596/1525/1/012022
    [52] J. Pivarski, P. Das, C. Burret al., scikit-hep/uproot: 3.12.0, (2020). DOI:https://doi.org/10.5281/zenodo.3952728
    [53] F. Cholletet al., Keras. https://keras.io (2015)
    [54] J. Pivarski, I. Osborne, I. Ifrimet al., Awkward Array, (2018). DOI:https://doi.org/10.5281/zenodo.4341376
    [55] A. J. Larkoski, I. Moult, and B. Nachman, Phys. Rept.841, 1 (2020) doi:10.1016/j.physrep.2019.11.001
    [56] A. Das, P. Konar, and A. Thalapillil, JHEP02, 083 (2018) doi:10.1007/JHEP02(2018)083
    [57] A. Bhardwaj, A. Das, P. Konaret al., J. Phys. G47(7), 075002 (2020) doi:10.1088/1361-6471/ab7769
    [58] S. Chakraborty, M. Mitra, and S. Shil, Phys. Rev. D100(1), 015012 (2019) doi:10.1103/PhysRevD.100.015012
    [59] L. Buonocore, U. Haisch, P. Nasonet al., Phys. Rev. Lett.125(23), 231804 (2020) doi:10.1103/PhysRevLett.125.231804
    [60] V. S. Ngairangbam, A. Bhardwaj, P. Konaret al., Eur. Phys. J. C80(11), 1055 (2020) doi:10.1140/epjc/s10052-020-08629-w
    [61] L. Buitinck, G. Louppe, M. Blondelet al., inECML PKDD Workshop: Languages for Data Mining and Machine Learning, (2013), pp. 108–122
    [62] A. Hockeret al.,TMVA - Toolkit for Multivariate Data Analysis, (2007)
    [63] T. Chen and C. Guestrin, inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (ACM, New York, NY, USA, 2016), KDD’16, pp. 785–794. DOI: 10.1145/2939672. 2939785
    [64] M. Guillame-Bert, S. Bruch, R. Stotzet al., inProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10, 2023, (2023), pp. 4068–4077. DOI:10.1145/3580305.3599933
  • 加载中

Figures(7)/Tables(3)

Get Citation
Jing Li and Hao Sun. HEP ML LAB: An end-to-end framework for applying machine learning into phenomenology studies[J]. Chinese Physics C. doi: 10.1088/1674-1137/addcc9
Jing Li and Hao Sun. HEP ML LAB: An end-to-end framework for applying machine learning into phenomenology studies[J]. Chinese Physics C. doi:10.1088/1674-1137/addcc9 shu
Milestone
Received: 2025-03-28
Article Metric

Article Views(3650)
PDF Downloads(12)
Cited by(0)
Policy on re-use
To reuse of Open Access content published by C
Baidu
map