Pythonで音声や歌声、楽器音などの収音・再生・録音を行うためのライブラリとしてはPyAudioが知られていますが、

低遅延であるASIO規格が利用できない
Python 3.7 以上には非対応

というデメリットがあります*1。

Python 3.7 以上でASIO対応デバイスを利用可能なライブラリとしては、python-sounddeviceがあります。

しかも PyAudio よりも使いやすい設計となっています。

以下では、このpython-sounddeviceの使い方と、リアルタイム収音・再生・録音のサンプルコードを紹介したいと思います。

Python-sounddeviceの特徴
インストール方法
使い方
ASIOなどのAPI+Input/Outputごとにデバイスを検出
リアルタイム処理
- 収音
- 再生
- 再生＋収音
- 録音
音声ファイルの再生・処理
まとめ

Python-sounddeviceの特徴

Python 3.7以上も利用可能（Python 2は非対応）
ASIO対応デバイスを利用可能
PyAudioと同じく、PortAudioライブラリのPython バインディング
PyAudioよりも記述が楽
信号バッファをバイト型ではなくNumpy配列で扱うことができる
公式サイトのサンプルコードが充実している
MITライセンス

github.com

実用的にも、リアルタイム音信号処理のソフトウェアを組む場合、低遅延であるASIOやCore Audio対応デバイスを利用できるかどうかはプログラムの品質に大きくかかわりますので、PyAudioよりもPython-sounddeviceのほうがよいと思います。

インストール方法

pip:

pip install sounddevice

python3 -m pip install sounddevice

conda:

conda install -c conda-forge python-sounddevice

使い方

デバイスの確認・設定

sd.query_devices()で利用可能なデバイスを確認することができます。この返り値であるsounddevice.DeviceListの番号を指定することで、デバイスを指定することができます。

デフォルトのデバイスを確認

import sounddevice as sd

device_list = sd.query_devices()
print(device_list)

for device_number in sd.default.device:
    print(device_number)
    print(device_list[device_number])

実行例

   0 Microsoft サウンド マッパー - Input, MME (2 in, 0 out)
>  1 IN (UA-25EX), MME (2 in, 0 out)
   2 ライン (Yamaha NETDUETTO Driver (W, MME (2 in, 0 out)
   3 マイク配列 (Realtek High Definition , MME (2 in, 0 out)
   4 VoiceMeeter Output (VB-Audio Vo, MME (2 in, 0 out)
   5 VoiceMeeter Aux Output (VB-Audi, MME (2 in, 0 out)
   6 Microsoft サウンド マッパー - Output, MME (0 in, 2 out)
<  7 OUT (UA-25EX), MME (0 in, 2 out)
......

1
{'name': 'IN (UA-25EX)', 'hostapi': 0, 'max_input_channels': 2, 'max_output_channels': 0, 'default_low_input_latency': 0.09, 'default_low_output_latency': 0.09, 'default_high_input_latency': 0.18, 'default_high_output_latency': 0.18, 'default_samplerate': 44100.0}
7
{'name': 'OUT (UA-25EX)', 'hostapi': 0, 'max_input_channels': 0, 'max_output_channels': 2, 'default_low_input_latency': 0.09, 'default_low_output_latency': 0.09, 'default_high_input_latency': 0.18, 'default_high_output_latency': 0.18, 'default_samplerate': 44100.0}

デバイスの変更

先ほどのデフォルトのデバイスを、Microsoft サウンドマッパー（0, 6）に変えるには、sd.default.deviceにデバイス番号をリスト形式で渡します。

sd.default.device = [0, 6]
print(sd.default.device)
>[0, 6]

なお、Input, Output両方対応している場合は、リスト形式ではなく値を渡すこともできます。

デバイスの種類

ASIOやMMEといった規格（API）はhostapiという変数で管理されています。

sd.query_hostapis()を使うと、各APIに属するデバイス番号が獲得できます。

>>> pprint(sd.query_hostapis())
({'default_input_device': 1,
  'default_output_device': 7,
  'devices': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
  'name': 'MME'},
 {'default_input_device': 12,
  'default_output_device': 18,
  'devices': [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
  'name': 'Windows DirectSound'},
 {'default_input_device': 24,
  'default_output_device': 24,
  'devices': [24, 25, 26, 27, 28, 29, 30],
  'name': 'ASIO'},
 {'default_input_device': 37,
  'default_output_device': 31,
  'devices': [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
  'name': 'Windows WASAPI'},
 {'default_input_device': 41,
  'default_output_device': 42,
  'devices': [41,
              42,
...
              66],
  'name': 'Windows WDM-KS'})

ASIOなどのAPI+Input/Outputごとにデバイスを検出

アプリのGUIでAPIを選べるようにするために、ASIOやMMEといったAPIと、Input/Outputごとにデバイス情報を保持する辞書型変数を作っています。

import sounddevice as sd

di = {}
for hostapi in sd.query_hostapis():
    hostapi_name = hostapi["name"]
    di[hostapi_name] = {"inputs": [], "outputs": []}
    for device_numbar in hostapi["devices"]:
        device = sd.query_devices(device=device_numbar)
        max_in_ch  = device["max_input_channels"]
        max_out_ch = device["max_output_channels"]
        
        if max_in_ch > 0:
            di[hostapi_name]["inputs"].append(device)
            
        if max_out_ch > 0:
            di[hostapi_name]["outputs"].append(device)

>>> from pprint import pprint
>>> pprint(di["ASIO"])
{'inputs': [{'default_high_input_latency': 0.046439909297052155,
             'default_high_output_latency': 0.046439909297052155,
             'default_low_input_latency': 0.011609977324263039,
             'default_low_output_latency': 0.011609977324263039,
             'default_samplerate': 44100.0,
             'hostapi': 2,
             'max_input_channels': 2,
             'max_output_channels': 2,
             'name': 'ASIO4ALL v2'},
            {'default_high_input_latency': 0.09287981859410431,
             'default_high_output_latency': 0.09287981859410431,
             'default_low_input_latency': 0.09287981859410431,
             'default_low_output_latency': 0.09287981859410431,
             'default_samplerate': 44100.0,
             'hostapi': 2,
             'max_input_channels': 2,
             'max_output_channels': 2,
             'name': 'FL Studio ASIO'},
...            ]

 'outputs': [{'default_high_input_latency': 0.046439909297052155,
              'default_high_output_latency': 0.046439909297052155,
              'default_low_input_latency': 0.011609977324263039,
              'default_low_output_latency': 0.011609977324263039,
              'default_samplerate': 44100.0,
              'hostapi': 2,
              'max_input_channels': 2,
              'max_output_channels': 2,
              'name': 'ASIO4ALL v2'},
             {'default_high_input_latency': 0.09287981859410431,
              'default_high_output_latency': 0.09287981859410431,
              'default_low_input_latency': 0.09287981859410431,
              'default_low_output_latency': 0.09287981859410431,
              'default_samplerate': 44100.0,
              'hostapi': 2,
              'max_input_channels': 2,
              'max_output_channels': 2,
              'name': 'FL Studio ASIO'},

...            ]

]}

リアルタイム処理

Pytho-sounddeviceではInputStream, OutputStreamといった, リアルタイムでの入出力をNumPy配列で扱うことができるクラスが用意されているので非常に便利です。

直接バイトバッファを扱いたい場合は、RawInputStreamなどのクラスを利用してください。

収音

import sounddevice as sd
import numpy as np
duration = 10  # 10秒間収音する

sd.default.device = [3, 10] # Input, Outputデバイス指定

def callback(indata, frames, time, status):
    # indata.shape=(n_samples, n_channels)
    # print root mean square in the current frame
    print(np.sqrt(np.mean(indata**2)))

with sd.InputStream(
        channels=1, 
        dtype='float32', 
        callback=callback
    ):
    sd.sleep(int(duration * 1000))

再生

OutputStreamを利用して、ブロックごとに再生信号を出力します。

以下のサンプルコードでは、440 Hzの正弦波を3秒間鳴らし続けます。

import sounddevice as sd
import numpy as np
duration = 3  # 3秒間再生する

sd.default.device = [3, 10] # Input, Outputデバイス指定
output_device_info = sd.query_devices(device=sd.default.device[1])

f0 = 440
offset = 0
sr_out = int(output_device_info["default_samplerate"])

def callback(outdata, frames, time, status):
    global sr_out, offset, f0
    n_samples, n_channels = outdata.shape
    
    t = np.arange(offset, offset+n_samples) / sr_out
    for k in range(n_channels):
        outdata[:, k] = np.sin(2*np.pi*f0*t) / n_channels

    offset += n_samples
    
with sd.OutputStream(
        channels=2, 
        dtype='float32', 
        callback=callback
    ):
    sd.sleep(int(duration * 1000))

再生＋収音

sounddevice.Stream()を使えば、収音と再生を一つのコールバック関数で行うことができます。

マイクで音を拾ってエフェクトをかけるようなソフトウェアでの利用にオススメです。

def callback(indata, outdata, frames, time, status):
    n_samples, n_channels = outdata.shape
    outdata[:] = your_processing(indata)

    
with sd.Stream(
        channels=2, 
        dtype='float32', 
        callback=callback
    ):
    sd.sleep(int(duration * 1000))

録音

録音は非常に簡単で、 sd.rec()関数を使えば1行で記述できます。

例えば、Pythonでオーディオファイルの読み書きを行うライブラリである SoundFile を使えば、NumPy配列である録音信号をwav形式で保存することができます。

以下がそのサンプルコードです。

sd.rec()は非同期処理であるため、処理を待つ場合はsd.wait()が必要です。

import sounddevice as sd
import soundfile as sf
import numpy as np

duration = 3  # 3秒間録音する

# デバイス情報関連
sd.default.device = [3, 7] # Input, Outputデバイス指定
input_device_info = sd.query_devices(device=sd.default.device[1])
sr_in = int(input_device_info["default_samplerate"])

# 録音
myrecording = sd.rec(int(duration * sr_in), samplerate=sr_in, channels=2)
sd.wait() # 録音終了待ち

print(myrecording.shape) #=> (duration * sr_in, channels)

# 録音信号のNumPy配列をwav形式で保存
sf.write("./myrecording.wav", myrecording, sr_in)