本文內(nèi)容來(lái)自先楫開發(fā)者 @Xusiwei1236,介紹了如何在HPM6750上運(yùn)行邊緣AI框架,感興趣的小伙伴快點(diǎn)來(lái)看看
--------------- 以下為測(cè)評(píng)內(nèi)容---------------
TFLM是什么?
你或許都聽說(shuō)過(guò)TensorFlow——由谷歌開發(fā)并開源的一個(gè)機(jī)器學(xué)習(xí)庫(kù),它支持模型訓(xùn)練和模型推理。
今天介紹的TFLM,全稱是TensorFlow Lite for Microcontrollers,翻譯過(guò)來(lái)就是“針對(duì)微控制器的TensorFlow Lite”。那TensorFlow Lite又是什么呢?
TensorFlow Lite(通常簡(jiǎn)稱TFLite)其實(shí)是TensorFlow團(tuán)隊(duì)為了將模型部署到移動(dòng)設(shè)備而開發(fā)的一套解決方案,通俗的說(shuō)就是手機(jī)版的TensorFlow。下面是TensorFlow官網(wǎng)上關(guān)于TFLite的一段介紹:
“TensorFlow Lite 是一組工具,可幫助開發(fā)者在移動(dòng)設(shè)備、嵌入式設(shè)備和 loT 設(shè)備上運(yùn)行模型,以便實(shí)現(xiàn)設(shè)備端機(jī)器學(xué)習(xí)?!?/i>
而我們今天要介紹的TensorFlow Lite for Microcontrollers(TFLM)則是 TensorFlow Lite的微控制器版本。這里是官網(wǎng)上的一段介紹:
“ TensorFlow Lite for Microcontrollers (以下簡(jiǎn)稱TFLM)是 TensorFlow Lite 的一個(gè)實(shí)驗(yàn)性移植版本,它適用于微控制器和其他一些僅有數(shù)千字節(jié)內(nèi)存的設(shè)備。它可以直接在“裸機(jī)”上運(yùn)行,不需要操作系統(tǒng)支持、任何標(biāo)準(zhǔn) C/C++ 庫(kù)和動(dòng)態(tài)內(nèi)存分配。核心運(yùn)行時(shí)(core runtime)在 Cortex M3 上運(yùn)行時(shí)僅需 16KB,加上足以用來(lái)運(yùn)行語(yǔ)音關(guān)鍵字檢測(cè)模型的操作,也只需 22KB 的空間?!?/span>
這三者一脈相承,都出自谷歌,區(qū)別是TensorFlow同時(shí)支持訓(xùn)練和推理,而后兩者只支持推理。TFLite主要用于支持手機(jī)、平板等移動(dòng)設(shè)備,TFLM則可以支持單片機(jī)。從發(fā)展歷程上來(lái)說(shuō),后兩者都是TensorFlow項(xiàng)目的“支線項(xiàng)目”?;蛘哒f(shuō)這三者是一個(gè)樹形的發(fā)展過(guò)程,具體來(lái)說(shuō),TFLite是從TensorFlow項(xiàng)目分裂出來(lái)的,TFLite-Micro是從TFLite分裂出來(lái)的,目前是三個(gè)并行發(fā)展的。在很長(zhǎng)一段時(shí)間內(nèi),這三個(gè)項(xiàng)目的源碼都在一個(gè)代碼倉(cāng)中維護(hù),從源碼目錄的包含關(guān)系上來(lái)說(shuō),TensorFlow包含后兩者,TFLite包含tflite-micro。
HPMSDK中的TFLM
- TFLM中間件
HPM SDK中集成了TFLM中間件(類似庫(kù),但是沒(méi)有單獨(dú)編譯為庫(kù)),位于hpm_sdk\middleware子目錄:
這個(gè)子目錄的代碼是由TFLM開源項(xiàng)目裁剪而來(lái),刪除了很多不需要的文件。
TFLM示例
HPM SDK中也提供了TFLM示例,位于hpm_sdk\samples\tflm子目錄:

示例代碼是從官方的persion_detection示例修改而來(lái),添加了攝像頭采集圖像和LCD顯示結(jié)果。
由于我手里沒(méi)有配套的攝像頭和顯示屏,所以本篇沒(méi)有以這個(gè)示例作為實(shí)驗(yàn)。
在HPM6750上運(yùn)行TFLM基準(zhǔn)測(cè)試
接下來(lái)以person detection benchmark為例,講解如何在HPM6750上運(yùn)行TFLM基準(zhǔn)測(cè)試。
將person detection benchmark源代碼添加到HPM SDK環(huán)境
按照如下步驟,在HPM SDK環(huán)境中添加person detection benchmark源代碼文件:
在HPM SDK的samples子目錄創(chuàng)建tflm_person_detect_benchmark目錄,并在其中創(chuàng)建src目錄;
從上文描述的已經(jīng)運(yùn)行過(guò)person detection benchmark的tflite-micro目錄中拷貝如下文件到src目錄:
tensorflow\lite\micro\benchmarks\person_detection_benchmark.cc
tensorflow\lite\micro\benchmarks\micro_benchmark.h
tensorflow\lite\micro\examples\person_detection\model_settings.h
tensorflow\lite\micro\examples\person_detection\model_settings.cc
在src目錄創(chuàng)建testdata子目錄,并將tflite-micro目錄下如下目錄中的文件拷貝全部到testdata中:
tensorflow\lite\micro\tools\make\gen\linux_x86_64_default\genfiles\tensorflow\lite\micro\examples\person_detection\testdata
修改person_detection_benchmark.cc、model_settings.cc、no_person_image_data.cc、person_image_data.cc 文件中部分#include預(yù)處理指令的文件路徑(根據(jù)拷貝后的相對(duì)路徑修改);
person_detection_benchmark.cc文件中,main函數(shù)的一開始添加一行board_init();、頂部添加一行#include "board.h”
添加CMakeLists.txt和app.yaml文件
在src平級(jí)創(chuàng)建CMakeLists.txt文件,內(nèi)容如下:
cmake_minimum_required(VERSION 3.13)
set(CONFIG_TFLM 1)
find_package(hpm-sdk REQUIRED HINTS $ENV{HPM_SDK_BASE})project(tflm_person_detect_benchmark)set(CMAKE_CXX_STANDARD 11)
sdk_app_src(src/model_settings.cc)sdk_app_src(src/person_detection_benchmark.cc)sdk_app_src(src/testdata/no_person_image_data.cc)sdk_app_src(src/testdata/person_image_data.cc)
sdk_app_inc(src)sdk_ld_options("-lm")sdk_ld_options("--std=c++11")sdk_compile_definitions(__HPMICRO__)sdk_compile_definitions(-DINIT_EXT_RAM_FOR_DATA=1)# sdk_compile_options("-mabi=ilp32f")# sdk_compile_options("-march=rv32imafc")sdk_compile_options("-O2")# sdk_compile_options("-O3")set(SEGGER_LEVEL_O3 1)generate_ses_project()在src平級(jí)創(chuàng)建app.yaml文件,內(nèi)容如下:
dependency: - tflm- 編譯和運(yùn)行TFLM基準(zhǔn)測(cè)試
接下來(lái)就是大家熟悉的——編譯運(yùn)行了。首先,使用generate_project生產(chǎn)項(xiàng)目:
接著,將HPM6750開發(fā)板連接到PC,在Embedded Studio中打卡剛剛生產(chǎn)的項(xiàng)目:
這個(gè)項(xiàng)目因?yàn)橐肓薚FLM的源碼,文件較多,所以右邊的源碼導(dǎo)航窗里面的Indexing要執(zhí)行很久才能結(jié)束。
然后,就可以使用F7編譯、F5調(diào)試項(xiàng)目了:

編譯完成后,先打卡串口終端連接到設(shè)備串口,波特率115200。啟動(dòng)調(diào)試后,直接繼續(xù)運(yùn)行,就可以在串口終端中看到基準(zhǔn)測(cè)試的輸出了:
============================== hpm6750evkmini clock summary==============================cpu0: 816000000Hzcpu1: 816000000Hzaxi0: 200000000Hzaxi1: 200000000Hzaxi2: 200000000Hzahb: 200000000Hzmchtmr0: 24000000Hzmchtmr1: 1000000Hzxpi0: 133333333Hzxpi1: 400000000Hzdram: 166666666Hzdisplay: 74250000Hzcam0: 59400000Hzcam1: 59400000Hzjpeg: 200000000Hzpdma: 200000000Hz==============================
----------------------------------------------------------------------$$\ $$\ $$$$$$$\ $$\ $$\ $$\$$ | $$ |$$ __$$\ $$$\ $$$ |\__|$$ | $$ |$$ | $$ |$$$$\ $$$$ |$$\ $$$$$$$\ $$$$$$\ $$$$$$\$$$$$$$$ |$$$$$$$ |$$\$$\$$ $$ |$$ |$$ _____|$$ __$$\ $$ __$$\$$ __$$ |$$ ____/ $$ \$$$ $$ |$$ |$$ / $$ | \__|$$ / $$ |$$ | $$ |$$ | $$ |\$ /$$ |$$ |$$ | $$ | $$ | $$ |$$ | $$ |$$ | $$ | \_/ $$ |$$ |\$$$$$$$\ $$ | \$$$$$$ |\__| \__|\__| \__| \__|\__| \_______|\__| \______/----------------------------------------------------------------------InitializeBenchmarkRunner took 114969 ticks (4 ms).
WithPersonDataIterations(1) took 10694521 ticks (445 ms)DEPTHWISE_CONV_2D took 275798 ticks (11 ms).DEPTHWISE_CONV_2D took 280579 ticks (11 ms).CONV_2D took 516051 ticks (21 ms).DEPTHWISE_CONV_2D took 139000 ticks (5 ms).CONV_2D took 459646 ticks (19 ms).DEPTHWISE_CONV_2D took 274903 ticks (11 ms).CONV_2D took 868518 ticks (36 ms).DEPTHWISE_CONV_2D took 68180 ticks (2 ms).CONV_2D took 434392 ticks (18 ms).DEPTHWISE_CONV_2D took 132918 ticks (5 ms).CONV_2D took 843014 ticks (35 ms).DEPTHWISE_CONV_2D took 33228 ticks (1 ms).CONV_2D took 423288 ticks (17 ms).DEPTHWISE_CONV_2D took 62040 ticks (2 ms).CONV_2D took 833033 ticks (34 ms).DEPTHWISE_CONV_2D took 62198 ticks (2 ms).CONV_2D took 834644 ticks (34 ms).DEPTHWISE_CONV_2D took 62176 ticks (2 ms).CONV_2D took 838212 ticks (34 ms).DEPTHWISE_CONV_2D took 62206 ticks (2 ms).CONV_2D took 832857 ticks (34 ms).DEPTHWISE_CONV_2D took 62194 ticks (2 ms).CONV_2D took 832882 ticks (34 ms).DEPTHWISE_CONV_2D took 16050 ticks (0 ms).CONV_2D took 438774 ticks (18 ms).DEPTHWISE_CONV_2D took 27494 ticks (1 ms).CONV_2D took 974362 ticks (40 ms).AVERAGE_POOL_2D took 2323 ticks (0 ms).CONV_2D took 1128 ticks (0 ms).RESHAPE took 184 ticks (0 ms).SOFTMAX took 2249 ticks (0 ms).
NoPersonDataIterations(1) took 10694160 ticks (445 ms)DEPTHWISE_CONV_2D took 274922 ticks (11 ms).DEPTHWISE_CONV_2D took 281095 ticks (11 ms).CONV_2D took 515380 ticks (21 ms).DEPTHWISE_CONV_2D took 139428 ticks (5 ms).CONV_2D took 460039 ticks (19 ms).DEPTHWISE_CONV_2D took 275255 ticks (11 ms).CONV_2D took 868787 ticks (36 ms).DEPTHWISE_CONV_2D took 68384 ticks (2 ms).CONV_2D took 434537 ticks (18 ms).DEPTHWISE_CONV_2D took 133071 ticks (5 ms).CONV_2D took 843202 ticks (35 ms).DEPTHWISE_CONV_2D took 33291 ticks (1 ms).CONV_2D took 423388 ticks (17 ms).DEPTHWISE_CONV_2D took 62190 ticks (2 ms).CONV_2D took 832978 ticks (34 ms).DEPTHWISE_CONV_2D took 62205 ticks (2 ms).CONV_2D took 834636 ticks (34 ms).DEPTHWISE_CONV_2D took 62213 ticks (2 ms).CONV_2D took 838212 ticks (34 ms).DEPTHWISE_CONV_2D took 62239 ticks (2 ms).CONV_2D took 832850 ticks (34 ms).DEPTHWISE_CONV_2D took 62217 ticks (2 ms).CONV_2D took 832856 ticks (34 ms).DEPTHWISE_CONV_2D took 16040 ticks (0 ms).CONV_2D took 438779 ticks (18 ms).DEPTHWISE_CONV_2D took 27481 ticks (1 ms).CONV_2D took 974354 ticks (40 ms).AVERAGE_POOL_2D took 1812 ticks (0 ms).CONV_2D took 1077 ticks (0 ms).RESHAPE took 341 ticks (0 ms).SOFTMAX took 901 ticks (0 ms).
WithPersonDataIterations(10) took 106960312 ticks (4456 ms)
NoPersonDataIterations(10) took 106964554 ticks (4456 ms)可以看到,在HPM6750EVKMINI開發(fā)板上,連續(xù)運(yùn)行10次人像檢測(cè)模型,總體耗時(shí)4456毫秒,每次平均耗時(shí)445.6毫秒。
在樹莓派3B+上運(yùn)行TFLM基準(zhǔn)測(cè)試
在樹莓派上運(yùn)行TFLM基準(zhǔn)測(cè)試
樹莓派3B+上可以和PC上類似,直接運(yùn)行PC端的測(cè)試命令,得到基準(zhǔn)測(cè)試結(jié)果:

可以看到,在樹莓派3B+上的,對(duì)于有人臉的圖片,連續(xù)運(yùn)行10次人臉檢測(cè)模型,總體耗時(shí)4186毫秒,每次平均耗時(shí)418.6毫秒;對(duì)于無(wú)人臉的圖片,連續(xù)運(yùn)行10次人臉檢測(cè)模型,耗時(shí)4190毫秒,每次平均耗時(shí)419毫秒。
HPM6750和樹莓派3B+、AMD R7 4800H上的基準(zhǔn)測(cè)試結(jié)果對(duì)比
這里將HPM6750EVKMINI開發(fā)板、樹莓派3B+和AMD R7 4800H上運(yùn)行人臉檢測(cè)模型的平均耗時(shí)結(jié)果匯總?cè)缦拢?/span>

可以看到,在TFLM人臉檢測(cè)模型計(jì)算場(chǎng)景下,HPM6750EVKMINI和樹莓派3B+成績(jī)相當(dāng)。雖然HPM6750的816MHz CPU頻率比樹莓派3B+搭載的BCM2837 Cortex-A53 1.4GHz的主頻低,但是在單核心計(jì)算能力上沒(méi)有相差太多。
這里樹莓派3B+上的TFLM基準(zhǔn)測(cè)試程序是運(yùn)行在64位Debian Linux發(fā)行版上的,而HPM6750上的測(cè)試程序是直接運(yùn)行在裸機(jī)上的。由于操作系統(tǒng)內(nèi)核中任務(wù)調(diào)度器的存在,會(huì)對(duì)CPU的計(jì)算能力帶來(lái)一定損耗。所以,這里進(jìn)行的并不是一個(gè)嚴(yán)格意義上的對(duì)比測(cè)試,測(cè)試結(jié)果僅供參考。
(本文參考鏈接:http://m.eeworld.com.cn/bbs_thread-1208270-1-1.html)
-
AI
+關(guān)注
關(guān)注
89文章
37459瀏覽量
292918
發(fā)布評(píng)論請(qǐng)先 登錄
【作品合集】米爾RK3576開發(fā)板測(cè)評(píng)
AI 邊緣計(jì)算網(wǎng)關(guān):開啟智能新時(shí)代的鑰匙?—龍興物聯(lián)
邊緣智能網(wǎng)關(guān)在水務(wù)行業(yè)中的應(yīng)用—龍興物聯(lián)
Nordic收購(gòu) Neuton.AI 關(guān)于產(chǎn)品技術(shù)的分析
邊緣AI的優(yōu)勢(shì)和技術(shù)基石
邊緣 AI:物聯(lián)網(wǎng)實(shí)施新標(biāo)桿
如何在Visual Studio 2022中運(yùn)行FX3吞吐量基準(zhǔn)測(cè)試工具?
Deepseek海思SD3403邊緣計(jì)算AI產(chǎn)品系統(tǒng)
重磅更新 | 先楫半導(dǎo)體HPM_SDK v1.9.0 發(fā)布
AI賦能邊緣網(wǎng)關(guān):開啟智能時(shí)代的新藍(lán)海
高速鏈路設(shè)計(jì)難?利用HPM6750雙千兆以太網(wǎng)透?jìng)鲗?shí)現(xiàn)LED大屏實(shí)時(shí)控制

測(cè)評(píng)分享 | 首嘗HPM6750運(yùn)行邊緣AI框架(含TFLM基準(zhǔn)測(cè)試)
評(píng)論