TVM开发报告 – 2020年10月

微信扫一扫,分享到朋友圈

TVM开发报告 – 2020年10月

社区论坛(discuss.tvm.ai) 在十月份累计页面浏览量约14万2千次, 独立用户访问数达2300余次。同时社区也分别加入了一位committer(@junrushao1994)和一位reviewer(@areusch). 功能和特性方面,我们在TIR中增加了仿射映射工具来支持循环优化和layout操作;在BYOC中加入了对TensorRT的支持;另外对auto-scheduler,TVM command-line, Rust binding,算子及模型也做了大量支持和优化。

具体细节和PR请参考下文:

Relay IR and TIR

  • [Relay] Mix mode type inference #6704
  • [Relay] Change some passes to mix mode #6695
  • [Relay] support i64 indices #6143
  • [ManifestAlloc] Handle TupleType inputs in CheckReshapeOnly #6776
  • [ARITH] Introduce iterator (quasi)affine map detection. #6667
  • [ARITH] Tight bound for floormod #6771

Operator support

  • [RELAY][OP] Dynamic conv2d batch size for cuda #6598
  • [Relay, TOPI] Complete rewrite of where op to support broadcasting #6759
  • [Topi] Allow batch_matmul to broadcast along batch dimension. #6616
  • [Relay][Training] Add more missing gradients #6767
  • [RELAY][OP] roi_pool operator alter layout #6516
  • Add dot product support for quantized convolution. #6445

Backend

  • [LLVM] Create fixed vector size according to latest LLVM12+ changes #6717
  • [Hexagon] Use nullptr instead of 0 in
    http://
    hexagon_device_sim.cc

    #6718
  • [LLVM] Avoid warnings when compiling getNumElements with LLVM12+ #6738
  • [Hexagon] Remove use of designated initializers from
    http://
    hexagon_module.cc

    #6055
  • [LLVM/CPU] Terminate basic block after “ret” instruction #6036
  • [LLVM] Add target feature string to function attributes #6763
  • [OpenCL] Only use thrust for cuda targets #6722
  • Adjust Vulkan queue selection and creation logic #6662
  • [VTA] quant support for alu-only op #6191

BYOC

  • [BYOC] Configurable optimize pass for PartitionGraph #6777
  • [BYOC][TensorRT] TensorRT BYOC integration #6395
  • [BYOC] Allow custom codegens to register their own constant updater #6697
  • [BYOC][ACL] Support add operation #6532
  • [BYOC] Added default_tuples parameter to AnnotateTarget pass #6655
  • [BYOC] Support control flow in annotate_target #6641

Ansor, Autoscheduler and AutoTVM

  • [AutoSchedule] Support multiple cache read and fix bugs #6686
  • [Ansor] Support multiple output ops and fix Python API printing #6584
  • [AutoTVM] Load configs even it has no entity #6100
  • [AutoScheduler] Improve the rule of mutating parallel granularity #6568
  • [AutoScheduler] Add task scheduler #6663 1

MicroTVM

  • [µTVM] Avoid use of builtin math functions #6630
  • Add µTVM Zephyr support + QEMU regression test #6603
  • [µTVM] Add serial transport, parameterize µTVM Zephyr test, run on physical HW #6789

Performance

  • Faster sparse_dense on GPUs #6580
  • [Relay] A set of utilities that allows a model to be run efficiently on tensorcores. #6748
  • [QNN] Optimize requantize for power of 2 and fix dequantize for per-channel quantized input #6675

Runtime

  • [RELAY][VM] Enable heterogeneous execution for Relay VM #6337
  • [Relay][VM] Add support for references. #6798
  • Updated runtime to run under FreeBSD. #6600

Tvmc

  • [tvmc] unify all logs on a single logger ‘TVMC’ #6577
  • [TVMC] use common function to obtain target from –target value on ‘tvmc compile’ #6788
  • [TVMC] ‘tvmc run’ –rpc-tracker and –rpc-tracker fail due to argparse misconfiguration #6762
  • [tvmc] Introduce ‘run’ subcommand (part 4/4) #6578
  • [tvmc] fix command line argument variable name in ‘compile’ #6574
  • [tvmc] command line driver ‘compile’ (part 2/4) #6302
  • [TVMC] fail gracefully in case no subcommand is provided #6625

Frontend

  • Add more Rust bindings #6678
  • [Rust] Improve NDArray, GraphRt, and Relay bindings #6563

Torch

  • [Torch, Quantization] Necessary workaround to prepare for 1.6 update #6602
  • [Torch, QNN] Support dynamic quantization flow to enable importing quantized transformer models #6782
  • [Torch] Object detection support update for PyTorch 1.6 #6659
  • [Torch] Support bincount and scatter_add ops #6740

Tensorflow

  • TF argmax – handling int64 datatype #6674
  • [TFLite, QNN] Slice op #6217
  • [TFLite] Fix detection of crop in convert_batch_to_space_nd #6670
  • [TENSORFLOW]TF Addons activations support added #5472
  • [Frontend][Tensorflow] Fix TF 1.15 conv2d_transpose parsing #6589
  • TF frontend: add expm1 op #6783

ONNX

  • [Relay][Frontend][Onnx] Allow A to B broadcasting of batch_matmul and reverse strided slice #6681
  • [Relay][Frontend][Onnx] Loop Support #6700

MXNet

  • [Relay][MXNet] Support broadcast_like #6561
  • [TOPI]][RELAY][MXNET]Reverse/Flip operator #5513
  • [RELAY][TOPI][MXNET]Sequence_last op support added #5994

Refactor and API changes

  • [REFACTOR] Remainings of util => utils #6778
  • Migrate IntImm & FloatImm ObjectRef to not-null #5788
  • [REFACTOR][Relay] Migrate Id ObjectRef to not-null #5748
  • [Diagnostics][Relay][InferType] Refactor InferType to work on whole module, and use new diagnostics. #6274
  • Refactor diagnostic to avoid circular dependencies #6692
  • Replace CHECK* with ICHECK* #6745
  • [API] Added remove_global_func to the Python API #6787
  • [RELAY] Refactor FoldConstant to skip TNonComputationalOps #6720
  • [TVMScript] refactor #6734

Build and CI

  • [CI] Update wasm emcc to latest #6755
  • [CI] Move to use main as the default #6665
  • [CI] CI docker staging update to latest #6708
  • [CI] Introduce all platform test for windows/mac/linux. #6756
  • [CI] Update ci-wasm to latest #6772
  • [CI] add python environment setup as part of cpp unittest runner script #6639
  • [TEST][TEDD] improve TEDD tests to also run on CPU Docker image #6643
  • [CI] add python environment setup as part of cpp unittest runner script #6639
  • [CI] Pin h5py version to < 3.0 to workaround issues with TF/Keras #6808
  • [TEST][CI] make sure graphviz is on both ci-cpu and ci-gpu images #6645
  • properly pass through command-line args in docker/bash.sh #6599
  • add black-format to docker/lint.sh, suppport in-place format #6601
  • [apps/bundle_deploy] Link demo_* targets with LDFLAGS and also with -lm. #6636
  • Add qemu build step to CI #6644
  • Add ci_qemu docker image #6485
  • Improve interactive docker/bash.sh #6344
  • Add cloudpickle dependency to docker images #6701
  • [Bugfix] Auto scheduler tutorial failure on CI #6723
  • [CI] Add m6g instance (ARM64) to CI #6781
  • [CI] fix cpp test #6796
  • [CI] Add m6g instance (ARM64) to CI #6780
  • [CI] Keras version upgraded from 2.3.1 to 2.4.3 #6793
  • [CI] Tensorflow version support upgrade from 2.1.0 to 2.3.1 #6706
  • [Docker] Turn on Rust docs and MxNet based ResNet #6640
  • [Docker] Fix tutorial broken by Docker build #6694
  • [Docker] Update CI CPU and GPU images based on new Docker build files. #6690
  • [CI] Install xgboost>=1.1.0 in CI container #6679
  • More CHECK to ICHECK #6758
  • Fix example code #6627
  • [Relay] Fix Strided Slice Infer Layout #6621
  • [CI] Update ci-cpu to the latest #6632
  • Add pytest-xdist and pytest-profiling to the base installation packages. #6736
  • [BUG_FIX] Fixes #6608: CHECK(data != nullptr) causes type checking to fail #6610
  • [Docker][CI][BYODT] add universal to Docker image #6654
  • update pyxir version to 0.1.3 #6769
  • Update to 20.08 version of the ethosn-driver. #6606
  • [CI] Set main as default in github actions #6669

Doc

  • [docs] Missing documentation dependency ‘autodocsumm’ on docs/README.txt #6595
  • [tvmc][docs] Getting started tutorial for TVMC #6597
  • [DOCS] Update has_dtype/has_shape to pattern lang doc #5847
  • [AutoScheduler] Use tempfile in tutorials #6728
  • [AutoScheduler] Improve the GPU tutorial by deleting measure_ctx earlier #6660
  • [AutoScheduler] Re-organize logs files for tutorials #6768
  • [Tutorial – QNN] Prequantized MXNet model compilation. #5362

Improvement and Bugfix

  • [PYTHON][WINDOWS] More robust dll loading behavior after python3.8 #6707
  • [LLVM][WINDOWS] Recover windows support for the latest LLVM #6698
  • Resolve more warnings in msvc #6702
  • [CONDA] Revamp conda recipe. #6732
  • [FFI][BUGFIX] Fix memory leak when Pac callback argument is NDArray #6744
  • [WASM] Update support for latest emcc, add ffi test. #6751
  • [FIX,MICROTVM] Skip microtvm tests if microtvm is not built #6693
  • [FIX,AUTOTVM] Print warning when all autotvm tasks fail with errors #6612
  • [FIX,MICROTVM] Add requires_micro decorators to microtvm tests #6747
  • [FIX,AUTOSCHEDULER] Fix auto_scheduler to run with multiprocessing’s spawn start method #6671
  • [FIX,PYLINT] Fix pylint errors on MacOS with Python 3.8 #6746
  • [TVMSCRIPT] Add synr dependency in preparation for tvmscript diagnostic overhaul #6795
  • [FIX,AUTOTVM] More descriptive error message when an autotvm task is not found #6652
  • [FIX][AUTOTVM] Make autotvm work with spawn #6790
  • [FIX,CMAKE] Use set_property with append flag instead of set_target_properties #6725
  • [AutoScheduler] Improve test cases #6657
  • [AutoScheduler] Fix a bug in thread binding #6683
  • [AutoScheduler] Fix mutate auto unroll #6807
  • [MKL] Fix offloading of batch_matmul to MKL #6752
  • [ConvertLayout] Fix Strided Slice #6619
  • [Topi][Cuda] Tiny bug fix for non-fp32 datatypes in conv2d_transpose. #6593
  • [Contrib][TRT] Fix Conv2D construction when channels attribute is not available. #6805
  • TFLite failures resulted from TF latest version upgrade resolved #6774
  • [Relay] Minor fix for some TF OD models #6729
  • [Relay] Fix dynamic case for Squeeze and Split #6739
  • [FIX] Fix cublas batch matmul #6715
  • [Frontend][Relay] Fix MXNet frontend to support NLP backbones in GluonNLP #6699
  • [Bugfix] Simplify reduce expression in te.gradient #6611
  • [ARITH] iter_affine_map bug fix, stride generalize #6753
  • Fix version check bug #6784
  • Fix leakyReLU support for CoreML #6651
  • fix a bug in convertSSA. #6785
  • Fix the Type bug in ConvertSSA. #6709
  • Fix format error in integrate.rst #6677
  • [Fix,Conda] update conda download url #6760
  • [CODEGEN][COREML] Call InferType explicitly in coreml test #6676
  • [AutoTVM][TOPI] Fix bifrost spatial packing conv2d auto tune #5684
  • Fix typographical error. #6664
  • Missing header for GraphRuntimeFactory in android_rpc #6648
  • [BUGFIX] Fix topi matrix multiplication using tensorcore to run faster #6749

People Who Reviewed Pull Requests:

Note: The format is name (number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (98), zhiics (34), comaniac (33), junrushao1994 (33), jroesch (22), tmoreau89 (21), ZihengJiang (17), leandron (16), masahi (15), mbrookhart (12), merrymercy (9), anijain2305 (9), kevinthesun (9), u99127 (9), icemelon9 (8), FrozenGene (8), jwfromm (8), MarisaKirisame (7), siju-samuel (5), mbaret (5), tkonolige (5), jcf94 (5), Laurawly (4), trevor-m (4), giuseros (4), electriclilies (4), liangfu (3), areusch (3), lhutton1 (3), cbalint13 (3), rkimball (3), manupa-arm (3), hogepodge (3), yzhliu (2), wweic (2), t-vi (2), yongwww (2), ANSHUMAN87 (2), maheshambule (2), hypercubestart (2), csullivan (2), tom-gall (2), altanh (2), vinx13 (1), nhynes (1), lixiaoquan (1), kparzysz-quic (1), Huyuwei (1), slyubomirsky (1), vegaluisjose (1), soiferj (1), ajtulloch (1), weberlo (1), antinucleon (1), spectrometerHBH (1), adityaatluri (1), jmorrill (1), yongfeng-nv (1), Hzfengsy (1), ptrendx (1), yuluny2 (1), Shawn-Inspur (1)

People Whose Pull Requests are Updated:

Note: The format is name (number of activities)

tqchen (19), leandron (13), areusch (12), tkonolige (11), comaniac (9), zhiics (8), mbrookhart (8), merrymercy (7), masahi (7), anijain2305 (7), jwfromm (6), kparzysz-quic (6), ANSHUMAN87 (6), jroesch (5), trevor-m (5), siju-samuel (4), rkimball (4), zhanghaohit (4), kevinthesun (3), lixiaoquan (3), sxjscience (3), ZihengJiang (2), yzhliu (2), tmoreau89 (2), mbaret (2), u99127 (2), d-smirnov (2), electriclilies (2), hypercubestart (2), spectrometerHBH (2), gussmith23 (2), codeislife99 (2), hzfan (2), ishitatsuyuki (2), lsy643 (2), Presburger (2), qixiuai (2), shibuiwilliam (2), jtuyls (2), rohanmukh (2), MarisaKirisame (1), kazum (1), slyubomirsky (1), yongwww (1), cchung100m (1), cbalint13 (1), wpan11nv (1), giuseros (1), maheshambule (1), jmorrill (1), altanh (1), cloud-mxd (1), mwillsey (1), dpankratz (1), tristan-arm (1), ptrendx (1), Meteorix (1), ghostplant (1), Beya2019 (1), alter-xp (1), cylinbao (1), cgyurgyik (1), hogepodge (1), nolanliou (1), qiangxu1996 (1), MasterJH5574 (1), zhiqwang (1), anilmartha (1), chinakook (1), iiahim (1)

Facebook:在年终购物季投放更长时间的广告对品牌更有利

上一篇

Golang + Nginx 动手写一个静态Pornhub网站

下一篇

你也可能喜欢

TVM开发报告 – 2020年10月

长按储存图像,分享给朋友