Basics

npu-smi

npu-smi info

# It's similar to nvida-smi
yyy@xxx:~$ npu-smi info

+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0                   Version: 24.1.0                                               |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B                | OK            | xx.x        xxx               0    / 0             |
| 0                         | 1234:d0:12.1  | 0           xxxx / 1xxxx      0    / xxxxx         |
+===========================+===============+====================================================+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 0                                                            |
+===========================+===============+====================================================+

atc

yyy@xxx:~$ cat /usr/local/Ascend/ascend-toolkit/set_env.sh

export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH
export ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
export LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:${ASCEND_TOOLKIT_HOME}/lib64/plugin/opskernel:${ASCEND_TOOLKIT_HOME}/lib64/plugin/nnengine:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/$(arch):$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/tools/aml/lib64:${ASCEND_TOOLKIT_HOME}/tools/aml/lib64/plugin:$LD_LIBRARY_PATH
export PYTHONPATH=${ASCEND_TOOLKIT_HOME}/python/site-packages:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe:$PYTHONPATH
export PATH=${ASCEND_TOOLKIT_HOME}/bin:${ASCEND_TOOLKIT_HOME}/compiler/ccec_compiler/bin:${ASCEND_TOOLKIT_HOME}/tools/ccec_compiler/bin:$PATH
export ASCEND_AICPU_PATH=${ASCEND_TOOLKIT_HOME}
export ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp
export TOOLCHAIN_HOME=${ASCEND_TOOLKIT_HOME}/toolkit
export ASCEND_HOME_PATH=${ASCEND_TOOLKIT_HOME}
cat /usr/local/Ascend/ascend-toolkit/latest/version.cfg

# version: 1.0
runtime_running_version=[7.6.0.1.220:8.0.0]
compiler_running_version=[7.6.0.1.220:8.0.0]
hccl_running_version=[7.6.0.1.220:8.0.0]
opp_running_version=[7.6.0.1.220:8.0.0]
toolkit_running_version=[7.6.0.1.220:8.0.0]
aoe_running_version=[7.6.0.1.220:8.0.0]
ncs_running_version=[7.6.0.1.220:8.0.0]
runtime_upgrade_version=[7.6.0.1.220:8.0.0]
compiler_upgrade_version=[7.6.0.1.220:8.0.0]
hccl_upgrade_version=[7.6.0.1.220:8.0.0]
opp_upgrade_version=[7.6.0.1.220:8.0.0]
toolkit_upgrade_version=[7.6.0.1.220:8.0.0]
aoe_upgrade_version=[7.6.0.1.220:8.0.0]
ncs_upgrade_version=[7.6.0.1.220:8.0.0]
runtime_installed_version=[7.6.0.1.220:8.0.0]
compiler_installed_version=[7.6.0.1.220:8.0.0]
hccl_installed_version=[7.6.0.1.220:8.0.0]
opp_installed_version=[7.6.0.1.220:8.0.0]
toolkit_installed_version=[7.6.0.1.220:8.0.0]
aoe_installed_version=[7.6.0.1.220:8.0.0]
ncs_installed_version=[7.6.0.1.220:8.0.0]
echo $PATH | grep ascend-toolkit

/usr/local/Ascend/ascend-toolkit/latest/bin:/usr/local/Ascend/ascend-toolkit/latest/compiler/ccec_compiler/bin:/usr/local/Ascend/ascend-toolkit/latest/tools/ccec_compiler/bin
ls /usr/local/Ascend/ascend-toolkit/

8.0  8.0.0  latest  set_env.sh

atc --help

ascend tool chain

ATC start working now, please wait for a moment.
usage: atc <args>
generate offline model example:
atc --model=./alexnet.prototxt --weight=./alexnet.caffemodel --framework=0 --output=./domi --soc_version=<soc_version> 
generate offline model for single op example:
atc --singleop=./op_list.json --output=./op_model --soc_version=<soc_version> 

===== Basic Functionality =====
[General]
  --h/help            Show this help message
  --mode              Run mode.
                       0: default, generate offline model;
                       1: convert model to JSON format;
                       3: only pre-check;
                       5: convert ge dump txt file to JSON format;
                       6: display model info;
                       30: convert original graph to execute-om for nano(offline model)

[Input]
  --distributed_cluster_build      build model for distribute mode, 1: enable distribute; 0(default): disable distribute
  --model             Model file
  --weight            Weight file. Required when framework is Caffe
  --om                The model file to be converted to json
  --framework         Framework type. 0:Caffe; 1:MindSpore; 3:Tensorflow; 5:Onnx
  --input_format      Format of input data. E.g.: "NCHW"
  --input_shape       Shape of static input data or shape range of dynamic input. Separate multiple nodes with semicolons (;). Use double quotation marks (") to enclose each argument.
                      E.g.: "input_name1:n1,c1,h1,w1;input_name2:n2,c2,h2,w2"
                            "input_name1:n1~n2,c1,h1,w1;input_name2:n3~n4,c2,h2,w2"
  --input_shape_range This option is deprecated and will be removed in future version, please use input_shape instead.Shape range of input data. Separate multiple nodes with semicolons (;).
                      Use double quotation marks (") to enclose each argument.
                      E.g.: "input_name1:[n1~n2,c1,h1,w1];input_name2:[n2,c2~c3,h2,w2]"
  --dynamic_batch_size Set dynamic batch size. E.g.: "batchsize1,batchsize2,batchsize3"
  --dynamic_image_size Set dynamic image size. Separate multiple nodes with semicolons (;). Use double quotation marks (") to enclose each argument.
                       E.g.: "imagesize1_height,imagesize1_width;imagesize2_height,imagesize2_width"
  --dynamic_dims      Set dynamic dims. Separate multiple nodes with semicolons (;). Use double quotation marks (") to enclose each argument.
                       E.g.: "dims1_n1,dims1_n2;dims2_n1,dims2_n2"
  --singleop          Single op definition file. atc will generate offline.model(s) for single op if --singleop is set.
  --shard_model_dir              directory of split models
  --model_relation_config        relation of split models
  --enable_graph_parallel        whether to enable parallel
  --graph_parallel_option_path   parallel strategy configuration file
  --cluster_config               logic cluster configuration of target environment

[Output]
  --output            Output file path&name(needn't suffix, will add .om/.exeom automatically).
                      If --mode is set to 30, an additional dbg file will be generated.
                      If --singleop is set, this arg specifies the directory to which the single op offline model will be generated.
  --output_type       Set net output type. Support FP32, FP16, UINT8, INT8. E.g.: FP16, indicates that all out nodes are set to FP16.
                      "node1:0:FP16;node2:1:FP32", indicates setting the datatype of multiple out nodes.
  --check_report      The pre-checking report file. Default value is: "check_result.json"
  --json              The output json file path&name which is converted from a model
  --host_env_os            OS type of the target execution environment.
                           The parameters that support setting are the OS types of the opp package
                           Supported host env os as list:
                           minios linux 
                           default: linux
  --host_env_cpu           CPU type of the target execution environment.
                           The parameters that support setting are the CPU types of the opp package
                           Supported host env cpu as list:
                           support cpu: aarch64 , respond to os: minios
                           support cpu: aarch64 x86_64 , respond to os: linux
                           default: aarch64

[Target]
  --soc_version       The soc version.
  --virtual_type      Set whether offline model can run on the virtual devices under compute capability allocation.
                      0 (default) : Disable virtualization; 1 : Enable virtualization.
  --core_type         Set core type AiCore or VectorCore. VectorCore: use vector core. Default value is: AiCore
  --aicore_num        Set aicore num
===== Advanced Functionality =====
[Feature]
  --out_nodes         Output nodes designated by users. Separate multiple nodes with semicolons (;).Use double quotation marks (") to enclose each argument.
                      E.g.: "node_name1:0;node_name1:1;node_name2:0"
  --input_fp16_nodes  Input node datatype is fp16. Separate multiple nodes with semicolons (;). Use double quotation marks (") to enclose each argument. E.g.: "node_name1;node_name2"
  --insert_op_conf    Config file to insert new op
  --op_name_map       Custom op name mapping file
                      Note: A semicolon(;) cannot be included in each path, otherwise the resolved path will not match the expected one.
  --is_input_adjust_hw_layout    Input node datatype is fp16 and format is NC1HWC0, used with input_fp16_nodes. true: enable; false(default): disable. E.g.: "true,true,false,true"
  --is_output_adjust_hw_layout   Net output node datatype is fp16 and format is NC1HWC0, used with out_nodes. true: enable; false(default): disable. E.g.: "true,true,false,true"
  --external_weight        Convert const to file constant, and save weight in file.
                           0 (default): save weight in om.  1: save weight in file.

[Model Tuning]
  --disable_reuse_memory    The switch of reuse memory. Default value is : 0. 0 means reuse memory, 1 means do not reuse memory.
  --fusion_switch_file      File for fusion rule(graph fusion and UB fusion).
                            Enter as the configuration file path, disable specified fusion rules
  --enable_scope_fusion_passes    validate the non-general scope fusion passes, multiple names can be set and separated by ','. E.g.: ScopePass1,ScopePass2,...
  --enable_single_stream    Enable single stream. true: enable; false(default): disable
  --ac_parallel_enable      Enable engines such as Aicpu to parallel with other engines in dynamic shape graphs. 1: enable; 0(default): disable
  --tiling_schedule_optimize Enable tiling schedule optimize. 1: enable; 0(default): disable
  --quant_dumpable          Ensure that the input and output of quant nodes can be dumped. 1: enable; 0(default): disable.
  --enable_small_channel    Set enable small channel. 0(default): disable; 1: enable
  --enable_compress_weight  Enable compress weight. true: enable; false(default): disable
  --compress_weight_conf    Config file to compress weight
  --compression_optimize_conf    Config file to compress optimize
  --sparsity                Optional; enable structured sparse. 0(default): disable; 1: enable
  --buffer_optimize         Set buffer optimize. Support "l2_optimize" (default), "l1_optimize", "off_optimize"
  --mdl_bank_path           Set the path of the custom repository generated after model tuning.
  --oo_level                The graph optimization level. Support "O1", "O3"(default).
  --topo_sorting_mode           The option of graph topological sort, 0: BFS; 1: DFS(default); 2: RDFS; 3: StableRDFS(stable topo).
  --oo_constant_folding           The switch of constant folding, false: disable; true(default): enable.
  --oo_dead_code_elimination           The switch of dead code elimination, false: disable; true(default): enable.

[Operator Tuning]
  --op_precision_mode     Set the path of operator precision mode configuration file (.ini)
  --allow_hf32            enable hf32. false: disable; true: enable. (not support, reserved)
  --precision_mode        precision mode, support force_fp16(default), force_fp32, cube_fp16in_fp32out, allow_mix_precision, allow_fp32_to_fp16, must_keep_origin_dtype, allow_mix_precision_fp16, allow_mix_precision_bf16, allow_fp32_to_bf16.
  --precision_mode_v2     precision mode v2, support fp16(default), origin, cube_fp16in_fp32out, mixed_float16, mixed_bfloat16, cube_hif8, mixed_hif8.
  --modify_mixlist        Set the path of operator mixed precision configuration file.
  --keep_dtype            Retains the precision of certain operators in inference scenarios by using a configuration file.
  --customize_dtypes      Set the path of custom dtypes configuration file.
  --is_weight_clip        Ensure weight is finite by cliped when its datatype is floating-point data, 0: disable; 1(default): enable.
  --op_bank_path          Set the path of the custom repository generated after operator tuning with Auto Tune.
  --op_select_implmode    Set op select implmode. Support high_precision, high_performance, high_precision_for_all, high_performance_for_all. default: high_performance
  --optypelist_for_implmode    Appoint which op to select implmode, cooperated with op_select_implmode.
                               Separate multiple nodes with commas (,). Use double quotation marks (") to enclose each argument. E.g.: "node_name1,node_name2"

[Debug]
  --op_debug_level        Debug enable for TBE operator building.
                          0 (default): Disable debug; 1: Enable TBE pipe_all, and generate the operator CCE file and Python-CCE mapping file (.json);
                          2: Enable TBE pipe_all, generate the operator CCE file and Python-CCE mapping file (.json), and enable the CCE compiler -O0-g.
                          3: Disable debug, and keep generating kernel file (.o and .json)
                          4: Disable debug, keep generation kernel file (.o and .json) and generate the operator CCE file (.cce) and the UB fusion computing description file (.json)
  --save_original_model   Control whether to output original model. E.g.: true: output original model
  --log                   Generate log with level. Support debug, info, warning, error, null(default)
  --dump_mode             The switch of dump json with shape, to be used with mode 1. 0(default): disable; 1: enable.
  --debug_dir             Set the save path of operator compilation intermediate files.
                          Default value: ./kernel_meta
  --status_check             switch for op status check such as overflow.
                             0(default): disable; 1: enable.
  --op_compiler_cache_dir    Set the save path of operator compilation cache files.
                             Default value: $HOME/atc_data
  --op_compiler_cache_mode   Set the operator compilation cache mode. Options are disable(default), enable and force(force to refresh the cache)
  --display_model_info     enable for display model info; 0(default): close display, 1: open display.
  --shape_generalized_build_mode    For selecting the mode of shape generalization when build graph.
                                    shape_generalized: Shape will be generalized during graph build
                                    shape_precise(default): Shape will not be generalized, use precise shape
  --op_debug_config        Debug enable for Operator memory detection, enter as the configuration file path.
                           If option is default, debug for Operator memory detection is disable. 
  --atomic_clean_policy    For selecting the atomic op clean memory policy.
                           0 (default): centralized clean.  1: separate clean.
  --deterministic          For deterministic calculation.
                           0 (default): deterministic off. 1: deterministic on.
  --export_compile_stat           The option of configuring statistics of the graph compiler, 0: Not Generate; 1: Generated when the program exits(default); 2: Generated when graph compilation complete.

atc

To show debug messages:

export ASCEND_SLOG_PRINT_TO_STDOUT=1

fix errors

[INFO] acl init success [INFO] open device 0 success [INFO] create new context
[ACL ERROR] E19999: Inner Error! E19999:
[PID: 3370899] 2025-10-17-18:40:43.617.804 Invalid opp version [8.3.T14.0.B101] or
compiler_version [],Please check if it is within the required
range[FUNC:CheckOsCpuInfoAndOppVersion][FILE:model_helper.cc][LINE:973]
TraceBack (most recent call last): Assert ((error_code) == ge::SUCCESS)
failed[FUNC:LoadExecutorFromModelData][FILE:api.cc][LINE:111]
[Model][FromData]call gert::LoadExecutorFromModelDataWithMem load model from data failed,
ge result[4294967295][FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
[ERROR] load model from file failed, model file is ./encoder.om
[WARN] Check failed:processModel->LoadModelFromFile(modelPath), ret:1

I am using ascendai/cann:latest from https://github.com/Ascend/cann-container-image in GitHub actions It shows:

source /usr/local/Ascend/ascend-toolkit/set_env.sh

/usr/local/Ascend/ascend-toolkit/8.3.RC1.alpha003/fwkacllib/lib64/libascend_protobuf.so
/usr/local/Ascend/ascend-toolkit/8.3.RC1.alpha003/fwkacllib/lib64/libascend_dump.so

And on my device, I have:

cat /usr/local/Ascend/driver/version.info

Version=24.1.0
ascendhal_version=7.35.23
aicpu_version=1.0
tdt_version=1.0
log_version=1.0
prof_version=2.0
dvppkernels_version=1.1
tsfw_version=1.0
Innerversion=V100R001C19SPC002B226
compatible_version=[V100R001C13],[V100R001C15],[V100R001C17],[V100R001C18],[V100R001C19]
compatible_version_fw=[7.0.0,7.6.99]
package_version=24.1.0

24.1.0 requires CANN 8.0 / 8.1 LTS, but i was using 8.3.RC1.alpha003, which is too new.

Switch to 8.1.rc1-910b-ubuntu22.04-py3.10. See https://github.com/Ascend/cann-container-image/tree/main/cann/8.1.rc1-910b-ubuntu22.04-py3.10

  • CANN 7.0.x supports 23.0.x, for 910/310

  • CANN 7.1.x supports 23.1.x and 23.2x, for A310B, 910B

  • CANN 8.0.x/8.1.x supports 24.0.x/24.1.x

  • CANN 8.3.x supports 25.0.x/ 25.1.x