Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 基於人物屬性特徵之多視角監控影片檢索管理系統設計
Design of a Multi-view Surveillance Video Retrieval and Management System Based on Pedestrian Attributes
作者 王懷憶
Wang, Huai-Yi
貢獻者 廖峻鋒
Liao, Chun-Feng
王懷憶
Wang, Huai-Yi
關鍵詞 監控影片檢索
人物屬性識別
YOLOv8n
PP-Human
加權餘弦相似度
Surveillance Video Retrieval and Management
Pedestrian Attribute Recognition
YOLOv8n
PP-Human
Weighted Cosine Similarity
日期 2025
上傳時間 1-Sep-2025 16:20:19 (UTC+8)
摘要 近年來,隨著監控攝影機技術的蓬勃發展與人工智慧模型的快速演進,智慧型監控系統已逐漸成為城市安全與場域管理的重要工具。這些攝影設備不僅能即時記錄現場畫面,更具備自動辨識能力,能產生包括人物特徵、物件類別、行為偵測與場景語意等結構化數據。然現有監控系統多仍侷限於傳統以時間軸與攝影機為主的檢索方式,無法充分利用所產生的豐富數據資源,導致在龐大的影像資料庫中搜尋特定目標時效率低下,並需大量人工逐一檢視確認,耗時費力且容易誤判。 本研究為了解決上述問題,提出一種以「屬性標籤索引技術」為核心之影片檢索與管理方法。該方法整合目標檢測與行人屬性辨識技術,對監控畫面中的人物進行屬性標註,並轉化為結構化索引資料,使影片能依據內容特徵進行更精確且有效的檢索。本研究同時設計並實作一套完整系統架構,後端模組負責接收與處理AI模型產出的屬性數據,分析多支影片間的語意關聯;前端介面則以視覺化方式呈現檢索結果與影片關聯地圖,提升使用者在檢索與管理過程中的體驗。 透過實證研究與案例測試,本研究驗證了屬性標籤索引技術於影片搜尋效率與管理效能上的顯著提升。相較傳統搜尋方式,使用者可更快速準確地定位目標片段,減少不必要的瀏覽與人力成本,並提高整體系統的操作直覺性與可用性。本研究成果預期能為未來智慧監控系統之資料管理提供參考依據,並拓展影像資料在公安、交通、商業與其他應用領域的價值。
In recent years, with the rapid development of surveillance camera technology and the evolution of artificial intelligence models, intelligent surveillance systems have gradually become essential tools for urban safety and environment management. These camera systems not only provide real-time visual monitoring but also possess automated recognition capabilities, generating structured data such as human attributes, object categories, behavior detection, and scene semantics. However, most existing surveillance systems still rely on conventional time-based and camera-based retrieval methods, failing to fully utilize the rich data produced. As a result, locating specific segments from vast video databases remains inefficient, time-consuming, and heavily dependent on manual inspection, often leading to human errors. To address these challenges, this study proposes a novel video retrieval and management method based on Attribute Tag Indexing Technology. The proposed approach integrates object detection and pedestrian attribute recognition to automatically annotate human features in surveillance footage and transform them into structured index data. This allows for more accurate and efficient video retrieval based on content characteristics. Furthermore, a complete system architecture is developed: the backend module processes attribute data generated by AI models and analyzes semantic relationships across videos, while the frontend visualizes retrieval results and inter-video relationships through an intuitive and interactive interface. Through empirical experiments and case testing, the proposed method demonstrates significant improvements in video search efficiency and management performance. Compared to traditional search methods, users can locate target segments more quickly and accurately, reducing browsing time and manual effort, while enhancing overall system usability. The outcomes of this research are expected to contribute to the development of intelligent surveillance data management systems and extend the practical value of video data in fields such as public safety, traffic monitoring, commercial analytics, and beyond.
參考文獻 [1] S. E. Umbaugh, Digital Image Processing and Analysis: Computer Vision and Image Analysis. Boca Raton, FL, USA: CRC Press, n.d. [2] C. Kastner, Machine Learning in Production: From Models to Products. Cambridge, MA, USA: MIT Press, 2025. [3] S. J. Prince, Understanding Deep Learning. Cambridge, UK: MIT Press, 2023. [4] V. Adewopo, N. Elsayed, Z. Elsayed, M. Ozer, A. Abdelgawad, and M. Bayoumi, "Review on action recognition for accident detection in smart city transportation systems," arXiv preprint arXiv:2208.09588, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2208.09588 [5] Y. Zhao and A. Cai, "A novel relative orientation feature for shape-based object recognition" in 2009 IEEE International Conference on Network Infrastructure and Digital Content, Beijing, China, 2009, pp. 686-689, doi: 10.1109/ICNIDC.2009.5360852. [6] J. Cao et al., "Multi-Task Collaborative Attention Network for Pedestrian Attribute Recognition" in 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 2023, pp. 1-6, doi: 10.1109/IJCNN54540.2023.10191574. [7] Y. Benezeth, B. Emile, H. Laurent, and C. Rosenberger, "Vision-based system for human detection and tracking in indoor environment," International Journal of Social Robotics, vol. 2, no. 1, pp. 41–52, 2010. [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proc. 25th Int. Conf. Neural Information Processing Systems (NeurIPS), 2012, pp. 1097–1105. [Online]. Available:https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf [9] J. Redmon and A. Farhadi, "YOLOv3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018. [Online]. Available: https://arxiv.org/abs/1804.02767 [10] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). "You only look once: Unified, real-time object detection. " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779–788) [11] W. Liu et al., "SSD: Single shot multibox detector," in Proc. Eur. Conf. Comput. Vis. (ECCV), B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., vol. 9905, Lecture Notes in Computer Science. Cham, Switzerland: Springer, 2016, pp. 21–37. [Online]. Available: https://doi.org/10.1007/978-3-319-46448-0_2 [12] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017, doi: 10.1109/TPAMI.2016.2577031. [13] X. Chen, S. Zhuang, X. Zheng and Z. Wang, "Pedestrian Attribute Recognition Based On Deep Learning : A Survey," in 2021 International Conference on Information Technology and Biomedical Engineering (ICITBE), Nanchang, China, 2021, pp. 140-144, doi: 10.1109/ICITBE54178.2021.00039. [14] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, "Scalable Person Re-identification: A Benchmark," in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1116-1124, doi: 10.1109/ICCV.2015.133. [15] NVIDIA Corporation, "Convolutional neural network (CNN)," NVIDIA Developer. [Online]. Available: https://developer.nvidia.com/discover/convolutional-neural-network [16] D. Li, X. Chen and K. Huang, "Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios," in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 2015, pp. 111-115, doi: 10.1109/ACPR.2015.7486476. [17] L. Bourdev, S. Maji and J. Malik, "Describing people: A poselet-based approach to attribute classification," in 2011 International Conference on Computer Vision, Barcelona, Spain, 2011, pp. 1543-1550, doi: 10.1109/ICCV.2011.6126413. [18] Z. Tianyu, M. Zhenjiang and Z. Jianhu, "Combining CNN with Hand-Crafted Features for Image Classification," in 2018 14th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 2018, pp. 554-557, doi: 10.1109/ICSP.2018.8652428. [19] Papers with Code, "Pedestrian attribute recognition," [Online]. Available: https://paperswithcode.com/task/pedestrian-attribute-recognition [20] N. Zhang and J. Kim, "A Survey on Attention mechanism in NLP," in 2023 International Conference on Electronics, Information, and Communication (ICEIC), Singapore, 2023, pp. 1-4, doi: 10.1109/ICEIC57457.2023.10049971. [21] X. Chen, C. Fu, M. Tie, C.-W. Sham, and H. Ma, "AFFNet: An attention-based feature-fused network for surface defect segmentation," Applied Sciences, vol. 13, no. 11, p. 6428, 2023. [Online]. Available: https://doi.org/10.3390/app13116428 [22] PaddlePaddle, "PaddleDetection: deploy pipeline README, " GitHub repository, release/2.7, 2023. [Online]. Available: https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.7/deploy/pipeline/README.md. [Accessed: Jul. 23, 2025]. [23] Y. Liu, J. Yan and W. Ouyang, "Quality Aware Network for Set to Set Recognition," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 4694-4703, doi: 10.1109/CVPR.2017.499. [24] D. Li, Z. Zhang, X. Chen and K. Huang, "A Richly Annotated Pedestrian Dataset for Person Retrieval in Real Surveillance Scenarios," in IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1575-1590, April 2019, doi: 10.1109/TIP.2018.2878349. [25] Y. Deng, P. Luo, C. C. Loy and X. Tang, "Pedestrian attribute recognition at far distance," in *Proc. 22nd ACM Int. Conf. Multimedia (ACM MM)*, Orlando, FL, USA, Nov. 2014, pp. 789–792, doi: 10.1145/2647868.2654966. [26] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, "Scalable person re-identification: A benchmark," *IEEE Trans. Pattern Anal. Mach. Intell.*, vol. 38, no. 9, pp. 1623–1640, Sep. 2016, doi: 10.1109/TPAMI.2015.2491929. [27] A. Bochkovskiy, C.-Y. Wang and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv preprint arXiv:2004.10934, 2020. [28] C. Zhang, "A Survey of Visual Traffic Surveillance Using Spatio-Temporal Analysis and Mining, " International Journal of Multimedia Data Engineering and Management, vol. 4, no. 3, pp. 42–60, Jul. 2013, doi: 10.4018/JMDEM.2013070103. [29] S. H.Y., G. Shivakumar and H. S. Mohana, "Crowd Behavior Analysis: A Survey," in 2017 International Conference on Recent Advances in Electronics and Communication Technology (ICRAECT), Bangalore, India, 2017, pp. 169-178, doi: 10.1109/ICRAECT.2017.66. [30] D. A. Reid, M. S. Nixon, and S. V. Stevenage, “Soft Biometrics; Human Identification Using Comparative Descriptions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 6, pp. 1216–1228, Jun. 2014, doi: 10.1109/TPAMI.2013.219.
描述 碩士
國立政治大學
資訊科學系碩士在職專班
112971017
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112971017
資料類型 thesis
dc.contributor.advisor 廖峻鋒zh_TW
dc.contributor.advisor Liao, Chun-Fengen_US
dc.contributor.author (Authors) 王懷憶zh_TW
dc.contributor.author (Authors) Wang, Huai-Yien_US
dc.creator (作者) 王懷憶zh_TW
dc.creator (作者) Wang, Huai-Yien_US
dc.date (日期) 2025en_US
dc.date.accessioned 1-Sep-2025 16:20:19 (UTC+8)-
dc.date.available 1-Sep-2025 16:20:19 (UTC+8)-
dc.date.issued (上傳時間) 1-Sep-2025 16:20:19 (UTC+8)-
dc.identifier (Other Identifiers) G0112971017en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/159300-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系碩士在職專班zh_TW
dc.description (描述) 112971017zh_TW
dc.description.abstract (摘要) 近年來,隨著監控攝影機技術的蓬勃發展與人工智慧模型的快速演進,智慧型監控系統已逐漸成為城市安全與場域管理的重要工具。這些攝影設備不僅能即時記錄現場畫面,更具備自動辨識能力,能產生包括人物特徵、物件類別、行為偵測與場景語意等結構化數據。然現有監控系統多仍侷限於傳統以時間軸與攝影機為主的檢索方式,無法充分利用所產生的豐富數據資源,導致在龐大的影像資料庫中搜尋特定目標時效率低下,並需大量人工逐一檢視確認,耗時費力且容易誤判。 本研究為了解決上述問題,提出一種以「屬性標籤索引技術」為核心之影片檢索與管理方法。該方法整合目標檢測與行人屬性辨識技術,對監控畫面中的人物進行屬性標註,並轉化為結構化索引資料,使影片能依據內容特徵進行更精確且有效的檢索。本研究同時設計並實作一套完整系統架構,後端模組負責接收與處理AI模型產出的屬性數據,分析多支影片間的語意關聯;前端介面則以視覺化方式呈現檢索結果與影片關聯地圖,提升使用者在檢索與管理過程中的體驗。 透過實證研究與案例測試,本研究驗證了屬性標籤索引技術於影片搜尋效率與管理效能上的顯著提升。相較傳統搜尋方式,使用者可更快速準確地定位目標片段,減少不必要的瀏覽與人力成本,並提高整體系統的操作直覺性與可用性。本研究成果預期能為未來智慧監控系統之資料管理提供參考依據,並拓展影像資料在公安、交通、商業與其他應用領域的價值。zh_TW
dc.description.abstract (摘要) In recent years, with the rapid development of surveillance camera technology and the evolution of artificial intelligence models, intelligent surveillance systems have gradually become essential tools for urban safety and environment management. These camera systems not only provide real-time visual monitoring but also possess automated recognition capabilities, generating structured data such as human attributes, object categories, behavior detection, and scene semantics. However, most existing surveillance systems still rely on conventional time-based and camera-based retrieval methods, failing to fully utilize the rich data produced. As a result, locating specific segments from vast video databases remains inefficient, time-consuming, and heavily dependent on manual inspection, often leading to human errors. To address these challenges, this study proposes a novel video retrieval and management method based on Attribute Tag Indexing Technology. The proposed approach integrates object detection and pedestrian attribute recognition to automatically annotate human features in surveillance footage and transform them into structured index data. This allows for more accurate and efficient video retrieval based on content characteristics. Furthermore, a complete system architecture is developed: the backend module processes attribute data generated by AI models and analyzes semantic relationships across videos, while the frontend visualizes retrieval results and inter-video relationships through an intuitive and interactive interface. Through empirical experiments and case testing, the proposed method demonstrates significant improvements in video search efficiency and management performance. Compared to traditional search methods, users can locate target segments more quickly and accurately, reducing browsing time and manual effort, while enhancing overall system usability. The outcomes of this research are expected to contribute to the development of intelligent surveillance data management systems and extend the practical value of video data in fields such as public safety, traffic monitoring, commercial analytics, and beyond.en_US
dc.description.tableofcontents 摘要 2 謝辭 5 目錄 6 表次 9 第一章 緒論 10 第一節 研究背景與動機 10 第二節 研究目的與問題 11 第三節 預期貢獻和研究流程 12 第二章 文獻探討 13 第一節 影像分析概述 13 第二節 物件識別 13 第三節 行人屬性識別 17 第四節 PAR 主流模型與資料集 18 第五節 監控影片中的檢索系統 20 第三章 系統設計 21 第一節 系統架構 23 第二節 資料流與模組互動流程 27 第四章 系統實作 30 第一節 使用YOLO、PADDLE設計與實作影像分析模組 30 第二節 使用POSTGRES SQL 資料庫設計以儲存結構化數據 33 第三節 使用FASTAPI建構後端服務與API開發 33 第四節 實作前端介面呈現搜尋結果 36 第五章 系統評估 39 第一節 搜尋時間定義與效率測試結果 39 第二節 互動情境 40 第三節 檢索需求與預測效果 42 第四節 易用性測試 43 第五節 研究問題與討論 49 第六章 結論 52 參考文獻 53zh_TW
dc.format.extent 11387089 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112971017en_US
dc.subject (關鍵詞) 監控影片檢索zh_TW
dc.subject (關鍵詞) 人物屬性識別zh_TW
dc.subject (關鍵詞) YOLOv8nzh_TW
dc.subject (關鍵詞) PP-Humanzh_TW
dc.subject (關鍵詞) 加權餘弦相似度zh_TW
dc.subject (關鍵詞) Surveillance Video Retrieval and Managementen_US
dc.subject (關鍵詞) Pedestrian Attribute Recognitionen_US
dc.subject (關鍵詞) YOLOv8nen_US
dc.subject (關鍵詞) PP-Humanen_US
dc.subject (關鍵詞) Weighted Cosine Similarityen_US
dc.title (題名) 基於人物屬性特徵之多視角監控影片檢索管理系統設計zh_TW
dc.title (題名) Design of a Multi-view Surveillance Video Retrieval and Management System Based on Pedestrian Attributesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] S. E. Umbaugh, Digital Image Processing and Analysis: Computer Vision and Image Analysis. Boca Raton, FL, USA: CRC Press, n.d. [2] C. Kastner, Machine Learning in Production: From Models to Products. Cambridge, MA, USA: MIT Press, 2025. [3] S. J. Prince, Understanding Deep Learning. Cambridge, UK: MIT Press, 2023. [4] V. Adewopo, N. Elsayed, Z. Elsayed, M. Ozer, A. Abdelgawad, and M. Bayoumi, "Review on action recognition for accident detection in smart city transportation systems," arXiv preprint arXiv:2208.09588, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2208.09588 [5] Y. Zhao and A. Cai, "A novel relative orientation feature for shape-based object recognition" in 2009 IEEE International Conference on Network Infrastructure and Digital Content, Beijing, China, 2009, pp. 686-689, doi: 10.1109/ICNIDC.2009.5360852. [6] J. Cao et al., "Multi-Task Collaborative Attention Network for Pedestrian Attribute Recognition" in 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 2023, pp. 1-6, doi: 10.1109/IJCNN54540.2023.10191574. [7] Y. Benezeth, B. Emile, H. Laurent, and C. Rosenberger, "Vision-based system for human detection and tracking in indoor environment," International Journal of Social Robotics, vol. 2, no. 1, pp. 41–52, 2010. [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proc. 25th Int. Conf. Neural Information Processing Systems (NeurIPS), 2012, pp. 1097–1105. [Online]. Available:https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf [9] J. Redmon and A. Farhadi, "YOLOv3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018. [Online]. Available: https://arxiv.org/abs/1804.02767 [10] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). "You only look once: Unified, real-time object detection. " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779–788) [11] W. Liu et al., "SSD: Single shot multibox detector," in Proc. Eur. Conf. Comput. Vis. (ECCV), B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., vol. 9905, Lecture Notes in Computer Science. Cham, Switzerland: Springer, 2016, pp. 21–37. [Online]. Available: https://doi.org/10.1007/978-3-319-46448-0_2 [12] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017, doi: 10.1109/TPAMI.2016.2577031. [13] X. Chen, S. Zhuang, X. Zheng and Z. Wang, "Pedestrian Attribute Recognition Based On Deep Learning : A Survey," in 2021 International Conference on Information Technology and Biomedical Engineering (ICITBE), Nanchang, China, 2021, pp. 140-144, doi: 10.1109/ICITBE54178.2021.00039. [14] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, "Scalable Person Re-identification: A Benchmark," in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1116-1124, doi: 10.1109/ICCV.2015.133. [15] NVIDIA Corporation, "Convolutional neural network (CNN)," NVIDIA Developer. [Online]. Available: https://developer.nvidia.com/discover/convolutional-neural-network [16] D. Li, X. Chen and K. Huang, "Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios," in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 2015, pp. 111-115, doi: 10.1109/ACPR.2015.7486476. [17] L. Bourdev, S. Maji and J. Malik, "Describing people: A poselet-based approach to attribute classification," in 2011 International Conference on Computer Vision, Barcelona, Spain, 2011, pp. 1543-1550, doi: 10.1109/ICCV.2011.6126413. [18] Z. Tianyu, M. Zhenjiang and Z. Jianhu, "Combining CNN with Hand-Crafted Features for Image Classification," in 2018 14th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 2018, pp. 554-557, doi: 10.1109/ICSP.2018.8652428. [19] Papers with Code, "Pedestrian attribute recognition," [Online]. Available: https://paperswithcode.com/task/pedestrian-attribute-recognition [20] N. Zhang and J. Kim, "A Survey on Attention mechanism in NLP," in 2023 International Conference on Electronics, Information, and Communication (ICEIC), Singapore, 2023, pp. 1-4, doi: 10.1109/ICEIC57457.2023.10049971. [21] X. Chen, C. Fu, M. Tie, C.-W. Sham, and H. Ma, "AFFNet: An attention-based feature-fused network for surface defect segmentation," Applied Sciences, vol. 13, no. 11, p. 6428, 2023. [Online]. Available: https://doi.org/10.3390/app13116428 [22] PaddlePaddle, "PaddleDetection: deploy pipeline README, " GitHub repository, release/2.7, 2023. [Online]. Available: https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.7/deploy/pipeline/README.md. [Accessed: Jul. 23, 2025]. [23] Y. Liu, J. Yan and W. Ouyang, "Quality Aware Network for Set to Set Recognition," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 4694-4703, doi: 10.1109/CVPR.2017.499. [24] D. Li, Z. Zhang, X. Chen and K. Huang, "A Richly Annotated Pedestrian Dataset for Person Retrieval in Real Surveillance Scenarios," in IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1575-1590, April 2019, doi: 10.1109/TIP.2018.2878349. [25] Y. Deng, P. Luo, C. C. Loy and X. Tang, "Pedestrian attribute recognition at far distance," in *Proc. 22nd ACM Int. Conf. Multimedia (ACM MM)*, Orlando, FL, USA, Nov. 2014, pp. 789–792, doi: 10.1145/2647868.2654966. [26] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, "Scalable person re-identification: A benchmark," *IEEE Trans. Pattern Anal. Mach. Intell.*, vol. 38, no. 9, pp. 1623–1640, Sep. 2016, doi: 10.1109/TPAMI.2015.2491929. [27] A. Bochkovskiy, C.-Y. Wang and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv preprint arXiv:2004.10934, 2020. [28] C. Zhang, "A Survey of Visual Traffic Surveillance Using Spatio-Temporal Analysis and Mining, " International Journal of Multimedia Data Engineering and Management, vol. 4, no. 3, pp. 42–60, Jul. 2013, doi: 10.4018/JMDEM.2013070103. [29] S. H.Y., G. Shivakumar and H. S. Mohana, "Crowd Behavior Analysis: A Survey," in 2017 International Conference on Recent Advances in Electronics and Communication Technology (ICRAECT), Bangalore, India, 2017, pp. 169-178, doi: 10.1109/ICRAECT.2017.66. [30] D. A. Reid, M. S. Nixon, and S. V. Stevenage, “Soft Biometrics; Human Identification Using Comparative Descriptions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 6, pp. 1216–1228, Jun. 2014, doi: 10.1109/TPAMI.2013.219.zh_TW