Basics
npu-smi
npu-smi info
# It's similar to nvida-smi
yyy@xxx:~$ npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0 Version: 24.1.0 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B | OK | xx.x xxx 0 / 0 |
| 0 | 1234:d0:12.1 | 0 xxxx / 1xxxx 0 / xxxxx |
+===========================+===============+====================================================+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 0 |
+===========================+===============+====================================================+
atc
yyy@xxx:~$ cat /usr/local/Ascend/ascend-toolkit/set_env.sh
export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH
export ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
export LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:${ASCEND_TOOLKIT_HOME}/lib64/plugin/opskernel:${ASCEND_TOOLKIT_HOME}/lib64/plugin/nnengine:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/$(arch):$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/tools/aml/lib64:${ASCEND_TOOLKIT_HOME}/tools/aml/lib64/plugin:$LD_LIBRARY_PATH
export PYTHONPATH=${ASCEND_TOOLKIT_HOME}/python/site-packages:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe:$PYTHONPATH
export PATH=${ASCEND_TOOLKIT_HOME}/bin:${ASCEND_TOOLKIT_HOME}/compiler/ccec_compiler/bin:${ASCEND_TOOLKIT_HOME}/tools/ccec_compiler/bin:$PATH
export ASCEND_AICPU_PATH=${ASCEND_TOOLKIT_HOME}
export ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp
export TOOLCHAIN_HOME=${ASCEND_TOOLKIT_HOME}/toolkit
export ASCEND_HOME_PATH=${ASCEND_TOOLKIT_HOME}
cat /usr/local/Ascend/ascend-toolkit/latest/version.cfg
# version: 1.0
runtime_running_version=[7.6.0.1.220:8.0.0]
compiler_running_version=[7.6.0.1.220:8.0.0]
hccl_running_version=[7.6.0.1.220:8.0.0]
opp_running_version=[7.6.0.1.220:8.0.0]
toolkit_running_version=[7.6.0.1.220:8.0.0]
aoe_running_version=[7.6.0.1.220:8.0.0]
ncs_running_version=[7.6.0.1.220:8.0.0]
runtime_upgrade_version=[7.6.0.1.220:8.0.0]
compiler_upgrade_version=[7.6.0.1.220:8.0.0]
hccl_upgrade_version=[7.6.0.1.220:8.0.0]
opp_upgrade_version=[7.6.0.1.220:8.0.0]
toolkit_upgrade_version=[7.6.0.1.220:8.0.0]
aoe_upgrade_version=[7.6.0.1.220:8.0.0]
ncs_upgrade_version=[7.6.0.1.220:8.0.0]
runtime_installed_version=[7.6.0.1.220:8.0.0]
compiler_installed_version=[7.6.0.1.220:8.0.0]
hccl_installed_version=[7.6.0.1.220:8.0.0]
opp_installed_version=[7.6.0.1.220:8.0.0]
toolkit_installed_version=[7.6.0.1.220:8.0.0]
aoe_installed_version=[7.6.0.1.220:8.0.0]
ncs_installed_version=[7.6.0.1.220:8.0.0]
echo $PATH | grep ascend-toolkit
/usr/local/Ascend/ascend-toolkit/latest/bin:/usr/local/Ascend/ascend-toolkit/latest/compiler/ccec_compiler/bin:/usr/local/Ascend/ascend-toolkit/latest/tools/ccec_compiler/bin
ls /usr/local/Ascend/ascend-toolkit/
8.0 8.0.0 latest set_env.sh
atc --help
ascend tool chain
ATC start working now, please wait for a moment.
usage: atc <args>
generate offline model example:
atc --model=./alexnet.prototxt --weight=./alexnet.caffemodel --framework=0 --output=./domi --soc_version=<soc_version>
generate offline model for single op example:
atc --singleop=./op_list.json --output=./op_model --soc_version=<soc_version>
===== Basic Functionality =====
[General]
--h/help Show this help message
--mode Run mode.
0: default, generate offline model;
1: convert model to JSON format;
3: only pre-check;
5: convert ge dump txt file to JSON format;
6: display model info;
30: convert original graph to execute-om for nano(offline model)
[Input]
--distributed_cluster_build build model for distribute mode, 1: enable distribute; 0(default): disable distribute
--model Model file
--weight Weight file. Required when framework is Caffe
--om The model file to be converted to json
--framework Framework type. 0:Caffe; 1:MindSpore; 3:Tensorflow; 5:Onnx
--input_format Format of input data. E.g.: "NCHW"
--input_shape Shape of static input data or shape range of dynamic input. Separate multiple nodes with semicolons (;). Use double quotation marks (") to enclose each argument.
E.g.: "input_name1:n1,c1,h1,w1;input_name2:n2,c2,h2,w2"
"input_name1:n1~n2,c1,h1,w1;input_name2:n3~n4,c2,h2,w2"
--input_shape_range This option is deprecated and will be removed in future version, please use input_shape instead.Shape range of input data. Separate multiple nodes with semicolons (;).
Use double quotation marks (") to enclose each argument.
E.g.: "input_name1:[n1~n2,c1,h1,w1];input_name2:[n2,c2~c3,h2,w2]"
--dynamic_batch_size Set dynamic batch size. E.g.: "batchsize1,batchsize2,batchsize3"
--dynamic_image_size Set dynamic image size. Separate multiple nodes with semicolons (;). Use double quotation marks (") to enclose each argument.
E.g.: "imagesize1_height,imagesize1_width;imagesize2_height,imagesize2_width"
--dynamic_dims Set dynamic dims. Separate multiple nodes with semicolons (;). Use double quotation marks (") to enclose each argument.
E.g.: "dims1_n1,dims1_n2;dims2_n1,dims2_n2"
--singleop Single op definition file. atc will generate offline.model(s) for single op if --singleop is set.
--shard_model_dir directory of split models
--model_relation_config relation of split models
--enable_graph_parallel whether to enable parallel
--graph_parallel_option_path parallel strategy configuration file
--cluster_config logic cluster configuration of target environment
[Output]
--output Output file path&name(needn't suffix, will add .om/.exeom automatically).
If --mode is set to 30, an additional dbg file will be generated.
If --singleop is set, this arg specifies the directory to which the single op offline model will be generated.
--output_type Set net output type. Support FP32, FP16, UINT8, INT8. E.g.: FP16, indicates that all out nodes are set to FP16.
"node1:0:FP16;node2:1:FP32", indicates setting the datatype of multiple out nodes.
--check_report The pre-checking report file. Default value is: "check_result.json"
--json The output json file path&name which is converted from a model
--host_env_os OS type of the target execution environment.
The parameters that support setting are the OS types of the opp package
Supported host env os as list:
minios linux
default: linux
--host_env_cpu CPU type of the target execution environment.
The parameters that support setting are the CPU types of the opp package
Supported host env cpu as list:
support cpu: aarch64 , respond to os: minios
support cpu: aarch64 x86_64 , respond to os: linux
default: aarch64
[Target]
--soc_version The soc version.
--virtual_type Set whether offline model can run on the virtual devices under compute capability allocation.
0 (default) : Disable virtualization; 1 : Enable virtualization.
--core_type Set core type AiCore or VectorCore. VectorCore: use vector core. Default value is: AiCore
--aicore_num Set aicore num
===== Advanced Functionality =====
[Feature]
--out_nodes Output nodes designated by users. Separate multiple nodes with semicolons (;).Use double quotation marks (") to enclose each argument.
E.g.: "node_name1:0;node_name1:1;node_name2:0"
--input_fp16_nodes Input node datatype is fp16. Separate multiple nodes with semicolons (;). Use double quotation marks (") to enclose each argument. E.g.: "node_name1;node_name2"
--insert_op_conf Config file to insert new op
--op_name_map Custom op name mapping file
Note: A semicolon(;) cannot be included in each path, otherwise the resolved path will not match the expected one.
--is_input_adjust_hw_layout Input node datatype is fp16 and format is NC1HWC0, used with input_fp16_nodes. true: enable; false(default): disable. E.g.: "true,true,false,true"
--is_output_adjust_hw_layout Net output node datatype is fp16 and format is NC1HWC0, used with out_nodes. true: enable; false(default): disable. E.g.: "true,true,false,true"
--external_weight Convert const to file constant, and save weight in file.
0 (default): save weight in om. 1: save weight in file.
[Model Tuning]
--disable_reuse_memory The switch of reuse memory. Default value is : 0. 0 means reuse memory, 1 means do not reuse memory.
--fusion_switch_file File for fusion rule(graph fusion and UB fusion).
Enter as the configuration file path, disable specified fusion rules
--enable_scope_fusion_passes validate the non-general scope fusion passes, multiple names can be set and separated by ','. E.g.: ScopePass1,ScopePass2,...
--enable_single_stream Enable single stream. true: enable; false(default): disable
--ac_parallel_enable Enable engines such as Aicpu to parallel with other engines in dynamic shape graphs. 1: enable; 0(default): disable
--tiling_schedule_optimize Enable tiling schedule optimize. 1: enable; 0(default): disable
--quant_dumpable Ensure that the input and output of quant nodes can be dumped. 1: enable; 0(default): disable.
--enable_small_channel Set enable small channel. 0(default): disable; 1: enable
--enable_compress_weight Enable compress weight. true: enable; false(default): disable
--compress_weight_conf Config file to compress weight
--compression_optimize_conf Config file to compress optimize
--sparsity Optional; enable structured sparse. 0(default): disable; 1: enable
--buffer_optimize Set buffer optimize. Support "l2_optimize" (default), "l1_optimize", "off_optimize"
--mdl_bank_path Set the path of the custom repository generated after model tuning.
--oo_level The graph optimization level. Support "O1", "O3"(default).
--topo_sorting_mode The option of graph topological sort, 0: BFS; 1: DFS(default); 2: RDFS; 3: StableRDFS(stable topo).
--oo_constant_folding The switch of constant folding, false: disable; true(default): enable.
--oo_dead_code_elimination The switch of dead code elimination, false: disable; true(default): enable.
[Operator Tuning]
--op_precision_mode Set the path of operator precision mode configuration file (.ini)
--allow_hf32 enable hf32. false: disable; true: enable. (not support, reserved)
--precision_mode precision mode, support force_fp16(default), force_fp32, cube_fp16in_fp32out, allow_mix_precision, allow_fp32_to_fp16, must_keep_origin_dtype, allow_mix_precision_fp16, allow_mix_precision_bf16, allow_fp32_to_bf16.
--precision_mode_v2 precision mode v2, support fp16(default), origin, cube_fp16in_fp32out, mixed_float16, mixed_bfloat16, cube_hif8, mixed_hif8.
--modify_mixlist Set the path of operator mixed precision configuration file.
--keep_dtype Retains the precision of certain operators in inference scenarios by using a configuration file.
--customize_dtypes Set the path of custom dtypes configuration file.
--is_weight_clip Ensure weight is finite by cliped when its datatype is floating-point data, 0: disable; 1(default): enable.
--op_bank_path Set the path of the custom repository generated after operator tuning with Auto Tune.
--op_select_implmode Set op select implmode. Support high_precision, high_performance, high_precision_for_all, high_performance_for_all. default: high_performance
--optypelist_for_implmode Appoint which op to select implmode, cooperated with op_select_implmode.
Separate multiple nodes with commas (,). Use double quotation marks (") to enclose each argument. E.g.: "node_name1,node_name2"
[Debug]
--op_debug_level Debug enable for TBE operator building.
0 (default): Disable debug; 1: Enable TBE pipe_all, and generate the operator CCE file and Python-CCE mapping file (.json);
2: Enable TBE pipe_all, generate the operator CCE file and Python-CCE mapping file (.json), and enable the CCE compiler -O0-g.
3: Disable debug, and keep generating kernel file (.o and .json)
4: Disable debug, keep generation kernel file (.o and .json) and generate the operator CCE file (.cce) and the UB fusion computing description file (.json)
--save_original_model Control whether to output original model. E.g.: true: output original model
--log Generate log with level. Support debug, info, warning, error, null(default)
--dump_mode The switch of dump json with shape, to be used with mode 1. 0(default): disable; 1: enable.
--debug_dir Set the save path of operator compilation intermediate files.
Default value: ./kernel_meta
--status_check switch for op status check such as overflow.
0(default): disable; 1: enable.
--op_compiler_cache_dir Set the save path of operator compilation cache files.
Default value: $HOME/atc_data
--op_compiler_cache_mode Set the operator compilation cache mode. Options are disable(default), enable and force(force to refresh the cache)
--display_model_info enable for display model info; 0(default): close display, 1: open display.
--shape_generalized_build_mode For selecting the mode of shape generalization when build graph.
shape_generalized: Shape will be generalized during graph build
shape_precise(default): Shape will not be generalized, use precise shape
--op_debug_config Debug enable for Operator memory detection, enter as the configuration file path.
If option is default, debug for Operator memory detection is disable.
--atomic_clean_policy For selecting the atomic op clean memory policy.
0 (default): centralized clean. 1: separate clean.
--deterministic For deterministic calculation.
0 (default): deterministic off. 1: deterministic on.
--export_compile_stat The option of configuring statistics of the graph compiler, 0: Not Generate; 1: Generated when the program exits(default); 2: Generated when graph compilation complete.
atc
To show debug messages:
export ASCEND_SLOG_PRINT_TO_STDOUT=1
fix errors
[INFO] acl init success [INFO] open device 0 success [INFO] create new context
[ACL ERROR] E19999: Inner Error! E19999:
[PID: 3370899] 2025-10-17-18:40:43.617.804 Invalid opp version [8.3.T14.0.B101] or
compiler_version [],Please check if it is within the required
range[FUNC:CheckOsCpuInfoAndOppVersion][FILE:model_helper.cc][LINE:973]
TraceBack (most recent call last): Assert ((error_code) == ge::SUCCESS)
failed[FUNC:LoadExecutorFromModelData][FILE:api.cc][LINE:111]
[Model][FromData]call gert::LoadExecutorFromModelDataWithMem load model from data failed,
ge result[4294967295][FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
[ERROR] load model from file failed, model file is ./encoder.om
[WARN] Check failed:processModel->LoadModelFromFile(modelPath), ret:1
I am using ascendai/cann:latest from https://github.com/Ascend/cann-container-image in GitHub actions
It shows:
source /usr/local/Ascend/ascend-toolkit/set_env.sh
/usr/local/Ascend/ascend-toolkit/8.3.RC1.alpha003/fwkacllib/lib64/libascend_protobuf.so
/usr/local/Ascend/ascend-toolkit/8.3.RC1.alpha003/fwkacllib/lib64/libascend_dump.so
And on my device, I have:
cat /usr/local/Ascend/driver/version.info
Version=24.1.0
ascendhal_version=7.35.23
aicpu_version=1.0
tdt_version=1.0
log_version=1.0
prof_version=2.0
dvppkernels_version=1.1
tsfw_version=1.0
Innerversion=V100R001C19SPC002B226
compatible_version=[V100R001C13],[V100R001C15],[V100R001C17],[V100R001C18],[V100R001C19]
compatible_version_fw=[7.0.0,7.6.99]
package_version=24.1.0
24.1.0 requires CANN 8.0 / 8.1 LTS, but i was using 8.3.RC1.alpha003, which is too new.
Switch to 8.1.rc1-910b-ubuntu22.04-py3.10. See https://github.com/Ascend/cann-container-image/tree/main/cann/8.1.rc1-910b-ubuntu22.04-py3.10
CANN 7.0.x supports 23.0.x, for 910/310
CANN 7.1.x supports 23.1.x and 23.2x, for A310B, 910B
CANN 8.0.x/8.1.x supports 24.0.x/24.1.x
CANN 8.3.x supports 25.0.x/ 25.1.x