Databricks Runtime 7.3 LTS para Aprendizado de Máquina (EoS)
Nota
O suporte para esta versão do Databricks Runtime terminou. Para obter a data de fim do suporte, consulte Histórico de fim do suporte. Para todas as versões suportadas do Databricks Runtime, consulte Versões e compatibilidade das notas de versão do Databricks Runtime.
A Databricks lançou esta versão em setembro de 2020. Foi declarado Apoio de Longo Prazo (LTS) em outubro de 2020.
O Databricks Runtime 7.3 LTS for Machine Learning fornece um ambiente pronto para uso para aprendizado de máquina e ciência de dados com base no Databricks Runtime 7.3 LTS (EoS). O Databricks Runtime ML contém muitas bibliotecas populares de aprendizado de máquina, incluindo TensorFlow, PyTorch e XGBoost. Ele também suporta treinamento distribuído de aprendizagem profunda usando Horovod.
Para obter mais informações, incluindo instruções para criar um cluster de ML do Databricks Runtime, consulte IA e aprendizado de máquina no Databricks.
Para obter ajuda com a migração do Databricks Runtime 6.x, consulte Guia de migração do Databricks Runtime 7.x (EoS).
Novos recursos e grandes mudanças
O Databricks Runtime 7.3 LTS for Machine Learning é construído sobre o Databricks Runtime 7.3 LTS. Para obter informações sobre o que há de novo no Databricks Runtime 7.3 LTS, incluindo Apache Spark MLlib e SparkR, consulte as notas de versão do Databricks Runtime 7.3 LTS (EoS ).
Principais alterações no ambiente Python do Databricks Runtime ML
Ativação do Conda nos trabalhadores
Anteriormente, quando você atualizava o ambiente de notebook usando %conda
o , o novo ambiente não era ativado em processos Python de trabalho. Isso causou problemas se uma função UDF do PySpark chamada uma função de terceiros que usava recursos instalados dentro do ambiente Conda. Esta limitação já não existe.
Você também deve revisar as principais alterações no ambiente Python do Databricks Runtime no Databricks Runtime 7.3 LTS (EoS). Para obter uma lista completa dos pacotes Python instalados e suas versões, consulte Bibliotecas Python.
Pacotes Python atualizados
- Mlflow 1.9.1 -> 1.11.0
- TensorFlow 2.2.0 -> 2.3.0
- TensorBoard 2.2.2 -> 2.3.0
- Pitocha 1.5.1 -> 1.6.0
- Torchvision 0.6.1 -> 0.7.0
- Petastorm 0.9.2 -> 0.9.5
Ambiente do sistema
O ambiente do sistema no Databricks Runtime 7.3 LTS for Machine Learning difere do Databricks Runtime 7.3 LTS da seguinte forma:
- DBUtils: Databricks Runtime ML não contém o utilitário Biblioteca (dbutils.library) (legado).
Você pode usar
%pip
e%conda
comandos em vez disso. Veja Bibliotecas em Python com âmbito de bloco de notas. - Para clusters de GPU, o Databricks Runtime ML inclui as seguintes bibliotecas de GPU NVIDIA:
- CUDA 10.1 Atualização 2
- cuDNN 7.6.5
- NCCL 2.7.3
- TensorRT 6.0.1
Bibliotecas
As seções a seguir listam as bibliotecas incluídas no Databricks Runtime 7.3 LTS for Machine Learning que diferem daquelas incluídas no Databricks Runtime 7.3 LTS.
Nesta secção:
- Bibliotecas de nível superior
- Bibliotecas Python
- Bibliotecas R
- Bibliotecas Java e Scala (cluster Scala 2.12)
Bibliotecas de nível superior
O Databricks Runtime 7.3 LTS for Machine Learning inclui as seguintes bibliotecas de nível superior:
- GraphFrames
- Horovod e HorovodRunner
- MLflow
- PyTorch
- conector spark-tensorflow;
- TensorFlow
- TensorBoard
Bibliotecas Python
O Databricks Runtime 7.3 LTS for Machine Learning usa o Conda para gerenciamento de pacotes Python e inclui muitos pacotes de ML populares.
Além dos pacotes especificados nos ambientes Conda nas seções a seguir, o Databricks Runtime 7.3 LTS for Machine Learning também instala os seguintes pacotes:
- hiperopt 0.2.4.db2
- Faísca 2.1.0-DB1
Bibliotecas Python em clusters de CPU
name: databricks-ml
channels:
- pytorch
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- absl-py=0.9.0=py37_0
- asn1crypto=1.3.0=py37_1
- astor=0.8.0=py37_0
- backcall=0.1.0=py37_0
- backports=1.0=py_2
- bcrypt=3.2.0=py37h7b6447c_0
- blas=1.0=mkl
- blinker=1.4=py37_0
- boto3=1.12.0=py_0
- botocore=1.15.0=py_0
- c-ares=1.15.0=h7b6447c_1001
- ca-certificates=2020.7.22=0
- cachetools=4.1.1=py_0
- certifi=2020.6.20=pyhd3eb1b0_3 # (updated from py37_0 in June 15, 2021 maintenance update)
- cffi=1.14.0=py37he30daa8_1 # (updated from py37h2e261b9_0 in June 15, 2021 maintenance update)
- chardet=3.0.4=py37_1003
- click=7.0=py37_0
- cloudpickle=1.3.0=py_0
- configparser=3.7.4=py37_0
- cpuonly=1.0=0
- cryptography=2.8=py37h1ba5d50_0
- cycler=0.10.0=py37_0
- cython=0.29.15=py37he6710b0_0
- decorator=4.4.1=py_0
- dill=0.3.1.1=py37_1
- docutils=0.15.2=py37_0
- entrypoints=0.3=py37_0
- flask=1.1.1=py_1
- freetype=2.9.1=h8a8886c_1
- future=0.18.2=py37_1
- gast=0.3.3=py_0
- gitdb=4.0.5=py_0
- gitpython=3.1.0=py_0
- google-auth=1.11.2=py_0
- google-auth-oauthlib=0.4.1=py_2
- google-pasta=0.2.0=py_0
- grpcio=1.27.2=py37hf8bcb03_0
- gunicorn=20.0.4=py37_0
- h5py=2.10.0=py37h7918eee_0
- hdf5=1.10.4=hb1b8bf9_0
- icu=58.2=he6710b0_3
- idna=2.8=py37_0
- intel-openmp=2020.0=166
- ipykernel=5.1.4=py37h39e3cac_0
- ipython=7.12.0=py37h5ca1d4c_0
- ipython_genutils=0.2.0=py37_0
- isodate=0.6.0=py_1
- itsdangerous=1.1.0=py37_0
- jedi=0.14.1=py37_0
- jinja2=2.11.1=py_0
- jmespath=0.10.0=py_0
- joblib=0.14.1=py_0
- jpeg=9b=h024ee3a_2
- jupyter_client=5.3.4=py37_0
- jupyter_core=4.6.1=py37_0
- kiwisolver=1.1.0=py37he6710b0_0
- krb5=1.17.1=h173b8e3_0 # (updated from 1.16.4 in June 15, 2021 maintenance update)
- ld_impl_linux-64=2.33.1=h53a641e_7
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.3=he6710b0_2 # (updated from 3.2.1 in June 15, 2021 maintenance update)
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran-ng=7.3.0=hdf63c60_0
- libpng=1.6.37=hbc83047_0
- libpq=12.2=h20c2e04_0 # (updated from 11.2 in June 15, 2021 maintenance update)
- libprotobuf=3.11.4=hd408876_0
- libsodium=1.0.16=h1bed415_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- libtiff=4.1.0=h2733197_0
- lightgbm=2.3.0=py37he6710b0_0
- lz4-c=1.8.1.2=h14c3975_0
- mako=1.1.2=py_0
- markdown=3.1.1=py37_0
- markupsafe=1.1.1=py37h14c3975_1
- matplotlib-base=3.1.3=py37hef1b27d_0
- mkl=2020.0=166
- mkl-service=2.3.0=py37he904b0f_0
- mkl_fft=1.0.15=py37ha843d7b_0
- mkl_random=1.1.0=py37hd6b4f25_0
- ncurses=6.2=he6710b0_1
- networkx=2.4=py_1
- ninja=1.10.0=py37hfd86e86_0
- nltk=3.4.5=py37_0
- numpy=1.18.1=py37h4f9e942_0
- numpy-base=1.18.1=py37hde5b4d6_1
- oauthlib=3.1.0=py_0
- olefile=0.46=py37_0
- openssl=1.1.1k=h27cfd23_0 # (updated from 1.1.1g in June 15, 2021 maintenance update)
- packaging=20.1=py_0
- pandas=1.0.1=py37h0573a6f_0
- paramiko=2.7.1=py_0
- parso=0.5.2=py_0
- patsy=0.5.1=py37_0
- pexpect=4.8.0=py37_1
- pickleshare=0.7.5=py37_1001
- pillow=7.0.0=py37hb39fc2d_0
- pip=20.0.2=py37_3
- plotly=4.9.0=py_0
- prompt_toolkit=3.0.3=py_0
- protobuf=3.11.4=py37he6710b0_0
- psutil=5.6.7=py37h7b6447c_0
- psycopg2=2.8.6=py37h3c74f83_1 # (updated from 2.8.4 in June 15, 2021 maintenance update)
- ptyprocess=0.6.0=py37_0
- pyasn1=0.4.8=py_0
- pyasn1-modules=0.2.7=py_0
- pycparser=2.19=py37_0
- pygments=2.5.2=py_0
- pyjwt=1.7.1=py37_0
- pynacl=1.3.0=py37h7b6447c_0
- pyodbc=4.0.30=py37he6710b0_0
- pyopenssl=19.1.0=py_1
- pyparsing=2.4.6=py_0
- pysocks=1.7.1=py37_1
- python=3.7.10=hdb3f193_0 # (updated from 3.7.6 in June 15, 2021 maintenance update)
- python-dateutil=2.8.1=py_0
- python-editor=1.0.4=py_0
- pytorch=1.6.0=py3.7_cpu_0
- pytz=2019.3=py_0
- pyzmq=18.1.1=py37he6710b0_0
- readline=8.1=h27cfd23_0 # (updated from 7.0 in June 15, 2021 maintenance update)
- requests=2.22.0=py37_1
- requests-oauthlib=1.3.0=py_0
- retrying=1.3.3=py37_2
- rsa=4.0=py_0
- s3transfer=0.3.3=py37_1
- scikit-learn=0.22.1=py37hd81dba3_0
- scipy=1.4.1=py37h0b6359f_0
- setuptools=45.2.0=py37_0
- simplejson=3.17.0=py37h7b6447c_0
- six=1.14.0=py37_0
- smmap=3.0.4=py_0
- sqlite=3.35.4=hdfb4753_0 # (updated from 3.31.1 in June 15, 2021 maintenance update)
- sqlparse=0.3.0=py_0
- statsmodels=0.11.0=py37h7b6447c_0
- tabulate=0.8.3=py37_0
- tk=8.6.10=hbc83047_0 # (updated from 8.6.8 in June 15, 2021 maintenance update)
- torchvision=0.7.0=py37_cpu
- tornado=6.0.3=py37h7b6447c_3
- tqdm=4.42.1=py_0
- traitlets=4.3.3=py37_0
- unixodbc=2.3.7=h14c3975_0
- urllib3=1.25.8=py37_0
- wcwidth=0.1.8=py_0
- websocket-client=0.56.0=py37_0
- werkzeug=1.0.0=py_0
- wheel=0.34.2=py37_0
- wrapt=1.11.2=py37h7b6447c_0
- xz=5.2.5=h7b6447c_0 # (updated from 5.2.4 in June 15, 2021 maintenance update)
- zeromq=4.3.1=he6710b0_3
- zlib=1.2.11=h7b6447c_3
- zstd=1.3.7=h0b5b093_0
- pip:
- astunparse==1.6.3
- azure-core==1.8.0
- azure-storage-blob==12.4.0
- databricks-cli==0.11.0
- diskcache==5.0.2
- docker==4.3.1
- gorilla==0.3.0
- horovod==0.19.5
- joblibspark==0.2.0
- keras-preprocessing==1.1.2
- koalas==1.2.0
- mleap==0.16.1
- mlflow==1.11.0
- msrest==0.6.18
- opt-einsum==3.3.0
- petastorm==0.9.5
- pyarrow==1.0.1
- pyyaml==5.3.1
- querystring-parser==1.2.4
- seaborn==0.10.0
- spark-tensorflow-distributor==0.1.0
- tensorboard==2.3.0
- tensorboard-plugin-wit==1.7.0
- tensorflow-cpu==2.3.0
- tensorflow-estimator==2.3.0
- termcolor==1.1.0
- xgboost==1.1.1
prefix: /databricks/conda/envs/databricks-ml
Bibliotecas Python em clusters GPU
name: databricks-ml-gpu
channels:
- pytorch
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- absl-py=0.9.0=py37_0
- asn1crypto=1.3.0=py37_1
- astor=0.8.0=py37_0
- backcall=0.1.0=py37_0
- backports=1.0=py_2
- bcrypt=3.2.0=py37h7b6447c_0
- blas=1.0=mkl
- blinker=1.4=py37_0
- boto3=1.12.0=py_0
- botocore=1.15.0=py_0
- c-ares=1.15.0=h7b6447c_1001
- ca-certificates=2020.7.22=0
- cachetools=4.1.1=py_0
- certifi=2020.6.20=pyhd3eb1b0_3 # (updated from py37_0 in June 15, 2021 maintenance update)
- cffi=1.14.0=py37he30daa8_1 # (updated from py37h2e261b9_0 in June 15, 2021 maintenance update)
- chardet=3.0.4=py37_1003
- click=7.0=py37_0
- cloudpickle=1.3.0=py_0
- configparser=3.7.4=py37_0
- cryptography=2.8=py37h1ba5d50_0
- cudatoolkit=10.1.243=h6bb024c_0
- cycler=0.10.0=py37_0
- cython=0.29.15=py37he6710b0_0
- decorator=4.4.1=py_0
- dill=0.3.1.1=py37_1
- docutils=0.15.2=py37_0
- entrypoints=0.3=py37_0
- flask=1.1.1=py_1
- freetype=2.9.1=h8a8886c_1
- future=0.18.2=py37_1
- gast=0.3.3=py_0
- gitdb=4.0.5=py_0
- gitpython=3.1.0=py_0
- google-auth=1.11.2=py_0
- google-auth-oauthlib=0.4.1=py_2
- google-pasta=0.2.0=py_0
- grpcio=1.27.2=py37hf8bcb03_0
- gunicorn=20.0.4=py37_0
- h5py=2.10.0=py37h7918eee_0
- hdf5=1.10.4=hb1b8bf9_0
- icu=58.2=he6710b0_3
- idna=2.8=py37_0
- intel-openmp=2020.0=166
- ipykernel=5.1.4=py37h39e3cac_0
- ipython=7.12.0=py37h5ca1d4c_0
- ipython_genutils=0.2.0=py37_0
- isodate=0.6.0=py_1
- itsdangerous=1.1.0=py37_0
- jedi=0.14.1=py37_0
- jinja2=2.11.1=py_0
- jmespath=0.10.0=py_0
- joblib=0.14.1=py_0
- jpeg=9b=h024ee3a_2
- jupyter_client=5.3.4=py37_0
- jupyter_core=4.6.1=py37_0
- kiwisolver=1.1.0=py37he6710b0_0
- krb5=1.16.4=h173b8e3_0 # (updated from 1.16.4 in June 15, 2021 maintenance update)
- ld_impl_linux-64=2.33.1=h53a641e_7
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.3=he6710b0_2 # (updated from 3.2.1 in June 15, 2021 maintenance update)
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran-ng=7.3.0=hdf63c60_0
- libpng=1.6.37=hbc83047_0
- libpq=12.2=h20c2e04_0 # (updated from 11.2 in June 15, 2021 maintenance update)
- libprotobuf=3.11.4=hd408876_0
- libsodium=1.0.16=h1bed415_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- libtiff=4.1.0=h2733197_0
- lightgbm=2.3.0=py37he6710b0_0
- lz4-c=1.8.1.2=h14c3975_0
- mako=1.1.2=py_0
- markdown=3.1.1=py37_0
- markupsafe=1.1.1=py37h14c3975_1
- matplotlib-base=3.1.3=py37hef1b27d_0
- mkl=2020.0=166
- mkl-service=2.3.0=py37he904b0f_0
- mkl_fft=1.0.15=py37ha843d7b_0
- mkl_random=1.1.0=py37hd6b4f25_0
- ncurses=6.2=he6710b0_1
- networkx=2.4=py_1
- ninja=1.10.0=py37hfd86e86_0
- nltk=3.4.5=py37_0
- numpy=1.18.1=py37h4f9e942_0
- numpy-base=1.18.1=py37hde5b4d6_1
- oauthlib=3.1.0=py_0
- olefile=0.46=py37_0
- openssl=1.1.1k=h27cfd23_0 # (updated from 1.1.1g in June 15, 2021 maintenance update)
- packaging=20.1=py_0
- pandas=1.0.1=py37h0573a6f_0
- paramiko=2.7.1=py_0
- parso=0.5.2=py_0
- patsy=0.5.1=py37_0
- pexpect=4.8.0=py37_1
- pickleshare=0.7.5=py37_1001
- pillow=7.0.0=py37hb39fc2d_0
- pip=20.0.2=py37_3
- plotly=4.9.0=py_0
- prompt_toolkit=3.0.3=py_0
- protobuf=3.11.4=py37he6710b0_0
- psutil=5.6.7=py37h7b6447c_0
- psycopg2=2.8.6=py37h3c74f83_1 # (updated from 2.8.4 in June 15, 2021 maintenance update)
- ptyprocess=0.6.0=py37_0
- pyasn1=0.4.8=py_0
- pyasn1-modules=0.2.7=py_0
- pycparser=2.19=py37_0
- pygments=2.5.2=py_0
- pyjwt=1.7.1=py37_0
- pynacl=1.3.0=py37h7b6447c_0
- pyodbc=4.0.30=py37he6710b0_0
- pyopenssl=19.1.0=py_1
- pyparsing=2.4.6=py_0
- pysocks=1.7.1=py37_1
- python=3.7.10=hdb3f193_0 # (updated from 3.7.6 in June 15, 2021 maintenance update)
- python-dateutil=2.8.1=py_0
- python-editor=1.0.4=py_0
- pytorch=1.6.0=py3.7_cuda10.1.243_cudnn7.6.3_0
- pytz=2019.3=py_0
- pyzmq=18.1.1=py37he6710b0_0
- readline=8.1=h27cfd23_0 # (updated from 7.0 in June 15, 2021 maintenance update)
- requests=2.22.0=py37_1
- requests-oauthlib=1.3.0=py_0
- retrying=1.3.3=py37_2
- rsa=4.0=py_0
- s3transfer=0.3.3=py37_1
- scikit-learn=0.22.1=py37hd81dba3_0
- scipy=1.4.1=py37h0b6359f_0
- setuptools=45.2.0=py37_0
- simplejson=3.17.0=py37h7b6447c_0
- six=1.14.0=py37_0
- smmap=3.0.4=py_0
- sqlite=3.35.4=hdfb4753_0 # (updated from 3.31.1 in June 15, 2021 maintenance update)
- sqlparse=0.3.0=py_0
- statsmodels=0.11.0=py37h7b6447c_0
- tabulate=0.8.3=py37_0
- tk=8.6.10=hbc83047_0 # (updated from 8.6.8 in June 15, 2021 maintenance update)
- torchvision=0.7.0=py37_cu101
- tornado=6.0.3=py37h7b6447c_3
- tqdm=4.42.1=py_0
- traitlets=4.3.3=py37_0
- unixodbc=2.3.7=h14c3975_0
- urllib3=1.25.8=py37_0
- wcwidth=0.1.8=py_0
- websocket-client=0.56.0=py37_0
- werkzeug=1.0.0=py_0
- wheel=0.34.2=py37_0
- wrapt=1.11.2=py37h7b6447c_0
- xz=5.2.5=h7b6447c_0 # (updated from 5.2.4 in June 15, 2021 maintenance update)
- zeromq=4.3.1=he6710b0_3
- zlib=1.2.11=h7b6447c_3
- zstd=1.3.7=h0b5b093_0
- pip:
- astunparse==1.6.3
- azure-core==1.8.0
- azure-storage-blob==12.4.0
- databricks-cli==0.11.0
- diskcache==5.0.2
- docker==4.3.1
- gorilla==0.3.0
- horovod==0.19.5
- joblibspark==0.2.0
- keras-preprocessing==1.1.2
- koalas==1.2.0
- mleap==0.16.1
- mlflow==1.11.0
- msrest==0.6.18
- opt-einsum==3.3.0
- petastorm==0.9.5
- pyarrow==1.0.1
- pyyaml==5.3.1
- querystring-parser==1.2.4
- seaborn==0.10.0
- spark-tensorflow-distributor==0.1.0
- tensorboard==2.3.0
- tensorboard-plugin-wit==1.7.0
- tensorflow==2.3.0
- tensorflow-estimator==2.3.0
- termcolor==1.1.0
- xgboost==1.1.1
prefix: /databricks/conda/envs/databricks-ml-gpu
Pacotes Spark contendo módulos Python
Pacote Spark | Módulo Python | Versão |
---|---|---|
quadros gráficos | quadros gráficos | 0.8.0-DB2-Faísca3.0 |
Bibliotecas R
As bibliotecas R são idênticas às bibliotecas R no Databricks Runtime 7.3 LTS.
Bibliotecas Java e Scala (cluster Scala 2.12)
Além das bibliotecas Java e Scala no Databricks Runtime 7.3 LTS, o Databricks Runtime 7.3 LTS for Machine Learning contém os seguintes JARs:
ID do Grupo | ID do Artefacto | Versão |
---|---|---|
com.typesafe.akka | AKKA-actor_2,12 | 2.5.23 |
ml.combust.mleap | mleap-databricks-runtime_2.12 | 0.17.3-4882dc3 |
ml.dmlc | xgboost4j-spark_2,12 | 1.0.0 |
ml.dmlc | xgboost4j_2.12 | 1.0.0 |
org.mlflow | mlflow-cliente | 1.11.0 |
org.scala-lang.modules | scala-java8-compat_2.12 | 0.8.0 |
org.tensorflow | spark-tensorflow-connector_2.12 | 1.15.0 |