はじめに

Python の音楽信号分析モジュールである LibROSAには、楽曲の音高／メロディ／和音情報をより精度よく抽出するためのツールとして、Salience （顕著性）スペクトログラムを抽出する関数 librosa.salience が実装されています。

今回は、librosa.salience を使って音高を抽出をするサンプルコードを紹介します。

なお、Salience スペクトログラムの説明やlibrosa.salience() の具体的な計算方法については、以下の記事にまとめています。

www.wizard-notes.com

librosa.salience の使い方

必要な入力と、始めに調整すべきパラメタについてコメントにて説明しております。

詳細は、LibROSAのドキュメントをご覧ください。

librosa.salience(
   S, # 分析する信号の STFT もしくは CQT スペクトログラム
   freqs, # Sの、角周波数ビンの中心周波数
   h_range, # Salience スペクトログラムの算出において、
            # どの倍音を考慮するか
   weights=None,  # ↑で指定した、各倍音をどれだけ重視するかの重み付け
   aggregate=None, 
   filter_peaks=True, 
   fill_value=nan, 
   kind='linear', 
   axis=0
)

ソースコードと音源

クリックで展開

import numpy as np
import librosa
import librosa.display
import matplotlib.pyplot as plt


filepath = "./miku_doremi_bpm120.wav"
y, sr = librosa.load(filepath, mono=True, offset=0.3, duration=7)


# 振幅スペクトログラムでの Salience Representation
h_range = [1, 2, 3, 4, 5, 6] # どの倍音まで考慮するか。 1 だと基音のみ
weights = [1.0, 0.5, 0.33, 0.25, 0.12, 0.06] # h_range の各倍音に対する重み付け

n_fft=1024

M = np.abs(librosa.stft(y, n_fft=n_fft))
fft_freqs = librosa.fft_frequencies(sr, n_fft=n_fft) # 各周波数ビンの中心周波数
M_salience = librosa.salience(M, fft_freqs, h_range, weights, fill_value=0)

# CQTでの Salience Representation
h_range = [1, 2] # どの倍音まで考慮するか。 1 だと基音のみ
weights = [1.0, 0.5] # h_range の各倍音に対する重み付け
n_bins=60
fmin = librosa.note_to_hz('C3') #130.8128 Hz
C = np.abs(librosa.cqt(y, n_bins=n_bins, sr=sr, fmin=fmin))
cqt_freqs = librosa.cqt_frequencies(n_bins=n_bins, fmin=fmin) # 各周波数ビンの中心周波数
C_salience = librosa.salience(C, cqt_freqs, h_range, weights, fill_value=0)



# Plot
def plot(S_before, S_after, title_before, y_axis="linear"):
    plt.subplot(2,1,1)
    librosa.display.specshow(librosa.amplitude_to_db(S_before, ref=np.max),
                        sr=sr, y_axis=y_axis, x_axis='time')
    plt.title(title_before)
    plt.colorbar(format="%+2.0f dB")

    plt.subplot(2,1,2)
    librosa.display.specshow(librosa.amplitude_to_db(S_after, ref=np.max),
                               sr=sr, y_axis=y_axis, x_axis='time')
    plt.title('Salience spectrogram')
    plt.colorbar(format="%+2.0f dB")

    plt.tight_layout()
    plt.show()


plot(M, M_salience, 'Magnitude spectrogram', y_axis="log")
plot(C, C_salience, 'CQT spectrogram', y_axis='cqt_note')