๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ”ฅ ํ”„๋กœ์ ํŠธ/์กธ์—…ํ”„๋กœ์ ํŠธ

[TarsosDSP] Real-time Pitch Detection

by nitronium102 2021. 11. 19.

ํ˜„์žฌ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ๋Š” ํ”„๋กœ์ ํŠธ์˜ ๋ฉ”์ธ ๊ธฐ๋Šฅ์€ ๋‘ ๊ฐ€์ง€์ด๋‹ค. 

1) ์‚ฌ์šฉ์ž์˜ ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ฐ ์Œ์˜ pitch๋ฅผ detectํ•˜๊ณ  ์ด๋ฅผ note๋กœ ๋ฐ”๊พธ์–ด์ฃผ๋Š” ๊ธฐ๋Šฅ

2) ์‚ฌ์šฉ์ž์—๊ฒŒ ์•Œ๋งž์€ key์— ๋งž์ถ”์–ด ๊ธฐ์กด ์Œ์› ๋ฐ์ดํ„ฐ์˜ pitch๋ฅผ shiftํ•ด์ฃผ๋Š” ๊ธฐ๋Šฅ

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ฒซ ๋ฒˆ์งธ ๊ธฐ๋Šฅ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ๊ณผ์ •์„ ๋‹ค๋ฃจ๋ ค๊ณ  ํ•œ๋‹ค.  

Library ์„ ์ •

๋Œ€๋žต 2๋‹ฌ ๋™์•ˆ ์‹ค์‹œ๊ฐ„ pitch dectection / pitch shifting์„ ์ง€์›ํ•˜๋ฉด์„œ ์˜คํ”ˆ ์†Œ์Šค์ธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐพ์•„๋ณด์•˜์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ์‹ค์‹œ๊ฐ„์„ฑ์„ ์ œ๊ณตํ•˜์ง€ ์•Š์•˜๋‹ค. ๊ทธ๋Ÿฌ๋˜ ์ค‘, ์‹ค์‹œ๊ฐ„์„ฑ์„ ์ง€์›ํ•˜๋ฉด์„œ๋„ ํ˜„์žฌ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š” ์•ˆ๋“œ๋กœ์ด๋“œ ์ŠคํŠœ๋””์˜ค์™€ 100% ํ˜ธํ™˜๋˜๋Š” TarsosDSP ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ฐพ๊ฒŒ ๋˜์—ˆ๋‹ค. 

 

TarsosDSP

์˜ค๋””์˜ค ํ”„๋กœ์„ธ์‹ฑ์„ ์œ„ํ•œ Java ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ, YIN, Mcleod Pitch method, Dynamic Wavelet Algorithm Pitch Tracking๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ pitch detection ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ๋˜ํ•œ Goertzel DTMF ๋””์ฝ”๋”ฉ ์•Œ๊ณ ๋ฆฌ์ฆ˜, time stretch algorithm(WSOLA), resampling, filters, simple synthesis, some audio effects์™€ pitch shifting ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ๊ณตํ•œ๋‹ค. 

 

TarsosDSP ์„ค์น˜

01. TarsosDSP ๋‹ค์šด๋กœ๋“œ

์•ˆ๋“œ๋กœ์ด๋“œ์—์„œ TarsosDSP Library๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•ด๋‹น ๋ฆด๋ฆฌ์ฆˆ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋‹ค์šด๋ฐ›์•„์•ผ ํ•œ๋‹ค.

์•ˆ๋“œ๋กœ์ด๋“œ์šฉ TarsosDSP ํ™ˆํŽ˜์ด์ง€

 

TarsosDSP-latest/TarsosDSP-Android-latest.jar ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œํ•œ๋‹ค. ์ด ๋•Œ ์••์ถ•์€ ํ’€์ง€ ์•Š๋Š”๋‹ค!

 

02. ํ”„๋กœ์ ํŠธ์— dependency ์ถ”๊ฐ€

lib ํด๋”์— ๋‹ค์šด๋ฐ›์€ jar ํŒŒ์ผ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด Android -> Project๋กœ ๊ตฌ์กฐ๋ฅผ ๋ณ€๊ฒฝํ•œ๋‹ค.

๋‹ค์šด๋ฐ›์€ TarsosDSP-latest.jar ํŒŒ์ผ์„ ๊ทธ๋Œ€๋กœ libs ํด๋”์— ๋„ฃ์–ด์ค€๋‹ค. 

์ƒ๋‹จ ๋ฉ”๋‰ด File -> Project Structure์—์„œ Dependency๋ฅผ ์„ ํƒํ•œ ํ›„, app ๋ชจ๋“ˆ์„ ์„ ํƒํ•˜๋ฉด ํ˜„์žฌ ์ถ”๊ฐ€๋œ dependency ๋ชฉ๋ก์ด ๋ณด์ธ๋‹ค.

(+) ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅธ ํ›„ JAR/AAR Dependency๋ฅผ ์„ ํƒํ•œ ํ›„, ๊ฒฝ๋กœ๋ช…์— libs/TarsosDSP-Android-latest.jar๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ๋“ฑ๋กํ•œ๋‹ค

์ •์ƒ์ ์œผ๋กœ ๋“ฑ๋ก๋˜์—ˆ๋‹ค๋ฉด, build.gradle ํŒŒ์ผ ํ•˜๋‹จ์— ์•„๋ž˜์™€ ๊ฐ™์ด ํ‘œ์‹œ๋  ๊ฒƒ์ด๋‹ค. 

app ํด๋” ๋‚ด์— ์žˆ๋Š” build.gradle์ด๋‹ค

// build.gradle(app)
dependencies {
    ...
    implementation files('libs/TarsosDSP-Android-latest.jar')
}

 

03. xml ํŒŒ์ผ ์ˆ˜์ •

๊ธฐ๋ณธ์ ์œผ๋กœ ๋งˆ์ดํฌ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ถŒํ•œ ๋“ฑ๋ก์˜ ํŽธ์˜๋ฅผ ์œ„ํ•ด targetSdkVersion์„ 22๋กœ ์„ค์ •ํ•ด์ค€๋‹ค.

// build.gradle(app)
android {
	...
    defaultConfig {
        ...
        targetSdk 22
        ...
    }

 

AndroidManifest.xml์—๋„ aubio ๋ฐ storage ๊ถŒํ•œ์„ ๋“ฑ๋กํ•œ๋‹ค. Manifest ํƒœ๊ทธ ๋ฐ”๋กœ ๋ฐ‘์— ํ•ด๋‹น ์ฝ”๋“œ๋ฅผ ๋„ฃ์–ด์ค€๋‹ค. 

<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    package="...">
    
 <uses-permission android:name="android.permission.RECORD_AUDIO"/>
 <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>
 <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/>

 

์‹ค์Šต ์„ค๋ช… ๋ฐ ์ฝ”๋“œ

1. ๊ธฐ๋ณธ GUI ์„ค์ •

๋…น์Œ๋˜๋Š” ์Œ์„ฑ์˜ pitch๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ‘œ์‹œํ•˜๊ธฐ ์œ„ํ•œ ํ…์ŠคํŠธ๋ทฐ์™€ ๋…น์Œ, ์žฌ์ƒ ๋ฒ„ํŠผ์„ ๊ตฌํ˜„ํ–ˆ๋‹ค. 

// activity_main.xml
<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:orientation="vertical">

    <FrameLayout
        android:layout_width="match_parent"
        android:layout_height="507dp"
        android:padding="16dp">

        <TextView
            android:id="@+id/textView"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:text="Pitch:" />

        <TextView
            android:id="@+id/pitchTextView"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:layout_gravity="center"
            android:text="0"
            android:textSize="50sp" />

    </FrameLayout>

    <Button
        android:id="@+id/recordButton"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:text="๋…น์Œ" />

    <Button
        android:id="@+id/playButton"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:text="์žฌ์ƒ" />

</LinearLayout>

 

2. TarsosDSP Format Settings

์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ค ์‹์œผ๋กœ processingํ•  ๊ฒƒ์ธ์ง€ ์ง€์ •ํ•ด์ฃผ๋Š” ๊ณผ์ •์ด๋‹ค. ์ด๋ฒˆ ์‹ค์Šต์—์„œ testingํ•  ์ฃผ์š” ๊ธฐ๋Šฅ์ด ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๊ธฐ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๊ฐ’์„ ์ง€์ •ํ•˜์˜€๋‹ค. ์„ธ๋ฐ€ํ•œ ๊ฐ’์€ ๋‚˜์ค‘์— ์ง€์ •ํ•  ์˜ˆ์ •์ด๋‹ค. 

public TarsosDSPAudioFormat(TarsosDSPAudioFormat.Encoding encoding,
                            float sampleRate,
                            int sampleSizeInBits,
                            int channels,
                            int frameSize,
                            float frameRate,
                            boolean bigEndian)
Parameters
encoding - the audio encoding technique
sampleRate - the number of samples per secondsample
SizeInBits - the number of bits in each sample
channels - the number of channels (1 for mono, 2 for stereo, and so on)
frameSize - the number of bytes in each frame
frameRate - the number of frames per second
bigEndian - indicates whether the data for a single sample is stored in big-endian byte order (false means little-endian)
TarsosDSPAudioFormat tarsosDSPAudioFormat;

protected void onCreate(Bundle savedInstanceState)
{
    ...
    tarsosDSPAudioFormat=new TarsosDSPAudioFormat(TarsosDSPAudioFormat.Encoding.PCM_SIGNED,
                22050,
                2 * 8,
                1,
                2 * 1,
                22050,
                ByteOrder.BIG_ENDIAN.equals(ByteOrder.nativeOrder()));
    ...
}

 

3. Voice Recording & Real Time Pitch Detection

์šฐ๋ฆฌ ํ”„๋กœ์ ํŠธ์—์„œ ํ•ต์‹ฌ์ ์ธ ๋ถ€๋ถ„์„ ๋‹ด๋‹นํ•˜๋Š” ๊ธฐ๋Šฅ์ด๋‹ค!!

1) TarsosDSP๋ฅผ ์ด์šฉํ•˜์—ฌ ๋งˆ์ดํฌ๋กœ ์‚ฌ์šฉ์ž์˜ ์Œ์„ฑ์„ ๋…น์Œํ•˜๋Š” ๋™์‹œ์— ํ•ด๋‹น ์Œ์„ฑ์˜ ์ฃผํŒŒ์ˆ˜๋ฅผ Note๋กœ ๋ฐ”๊พธ์–ด์ค€๋‹ค.

2) ํ•ด๋‹น ์Œ์„ฑ์ด ์ž…๋ ฅ๋œ ์‹œ๊ฐ„๊ณผ ๋ณ€ํ™˜๋œ Note๋ฅผ ํ•˜๋‚˜์˜ hashmap์œผ๋กœ ๋ฌถ์–ด ์ €์žฅํ•œ๋‹ค. 

3) ์ด hashmap๊ณผ ์›๋ž˜ ์Œ์›์˜ <์‹œ๊ฐ„, note>hashmap์„ ๋น„๊ตํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ key๊ฐ€ ์›๊ณก์˜ ์Œ์ •๊ณผ ๋ฒ—์–ด๋‚ฌ๋Š”์ง€, ๋ฐ•์ž๊ฐ€ ์ผ์น˜ํ•˜๋Š”์ง€ ํŒ๋‹จํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. 

์ด ํฌ์ŠคํŒ…์—์„œ๋Š” ์‚ฌ์šฉ์ž์˜ ์Œ์„ฑ์„ ์ฒ˜๋ฆฌํ•˜๋Š” (1)~(2)๋งŒ ์„ค๋ช…ํ•˜๊ฒ ๋‹ค.

  • ์‹œ์ž‘ ์‹œ๊ฐ„ ์ธก์ •
long start = System.currentTimeMillis();
  • <์‹œ๊ฐ„, note>๋ฅผ ๋‹ด์„ hashmap ์ƒ์„ฑ. milisecond ๋‹จ์œ„๋กœ ๋‹ด์•„์•ผ ํ•˜๊ธฐ์— Double ์ž๋ฃŒํ˜•์„ ์„ ํƒํ–ˆ๋‹ค. 
HashMap<Double, String> dictionary = new HashMap<Double, String>();
  • ํ˜„์žฌ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š” dispatcher ๊ฐ์ฒด๋ฅผ ์ œ๊ฑฐํ•˜๊ณ , ๋งˆ์ดํฌ๋กœ๋ถ€ํ„ฐ ์ž…๋ ฅ์„ ๋ฐ›๋Š” dispatcher ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.
releaseDispatcher();
dispatcher = AudioDispatcherFactory.fromDefaultMicrophone(22050,1024,0);
  • ์ž…๋ ฅ๋ฐ›์€ ์Œ์„ฑ ํŒŒ์ผ์„ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด RandomAccessFile์„ ์ƒ์„ฑํ•ด์ฃผ๊ณ , ์ง€์ •ํ•œ ์ถœ๋ ฅ์œผ๋กœ ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋กํ•˜๋Š” WriterProcessor(AudioProcessor) ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•˜์—ฌ dispatcher์— ์ถ”๊ฐ€ํ•œ๋‹ค.
RandomAccessFile randomAccessFile = new RandomAccessFile(file,"rw");
AudioProcessor recordProcessor = new WriterProcessor(tarsosDSPAudioFormat, randomAccessFile);
dispatcher.addAudioProcessor(recordProcessor);
  • pitch detection handler๋ฅผ ๋งŒ๋“ค์–ด ์ž…๋ ฅ๋œ pitch๋ฅผ ๊ฐ€์ ธ์˜จ ํ›„, note๋กœ ๋ณ€ํ™˜ํ•ด์ค€๋‹ค.
PitchDetectionHandler pitchDetectionHandler = new PitchDetectionHandler() {
    @Override
    public void handlePitch(PitchDetectionResult res, AudioEvent e){
        final float pitchInHz = res.getPitch(); // ์ž…๋ ฅ๋œ pitch ๊ฐ€์ ธ์˜ค๊ธฐ
        String octav = ProcessPitch.processPitch(pitchInHz); // pitch -> note
        runOnUiThread(new Runnable() {
        ...
        });
    }
};
  • ๋ณ€ํ™˜ํ•œ note๋ฅผ ํ™”๋ฉด์— ํ‘œ์‹œํ•ด์ฃผ๊ณ , ๋…น์Œ์ด ์‹œ์ž‘๋œ ์ดํ›„๋กœ ๋ช‡ ์ดˆ๊ฐ€ ์ง€๋‚ฌ๋Š”์ง€ ๊ณ„์‚ฐํ•˜์—ฌ hashmap์— ๋„ฃ๋Š”๋‹ค. 
runOnUiThread(new Runnable() {
            @Override
            public void run() {
                pitchTextView.setText(octav); // ํ™”๋ฉด์— ํ‘œ์‹œ๋˜๋Š” note ๋ณ€๊ฒฝ
                long end = System.currentTimeMillis(); // note๊ฐ€ ์ž…๋ ฅ๋œ ์‹œ๊ฐ„ ๊ฐ€์ ธ์˜ค๊ธฐ(์ผ๋ฐ˜์‹œ๊ฐ)
                double time = (end-start)/(1000.0); // ๋…น์Œ์ด ์‹œ์ž‘๋œ ์ดํ›„์˜ ์‹œ๊ฐ„์œผ๋กœ ๋ณ€๊ฒฝ
                dictionary.put(time, octav); // hashmap์— <time, note> ์ž…๋ ฅ
            }
        });
  • (Optional)๋…น์Œ์ด ์ข…๋ฃŒ๋˜๋ฉด, ์ €์žฅ๋œ dictionary์— ์žˆ๋Š” ๊ฐ’์„ ํ™•์ธํ•œ๋‹ค.
Set set = dictionary.entrySet();
Iterator iter2 = set.iterator();
while(iter2.hasNext()) {
    Map.Entry entry = (Map.Entry)iter2.next();
    Log.v("result",(Double)entry.getKey()+" "+(String)entry.getValue());
}
  • pitch detection์€ pitchProcessor ํด๋ž˜์Šค๋ฅผ ํ†ตํ•ด ์ˆ˜ํ–‰๋˜๋Š”๋ฐ, ์ด ๋•Œ ์‹ค์‹œ๊ฐ„ pitch detection ๊ฒฐ๊ณผ๋ฅผ ์ „๋‹ฌ๋ฐ›๊ธฐ ์œ„ํ•œ thread handler ๊ฐ์ฒด๋ฅผ ์ง€์ •ํ•ด์ค˜์•ผ ํ•œ๋‹ค. 
PitchProcessor(PitchProcessor.PitchEstimationAlgorithm algorithm, 
				float sampleRate, 
                int bufferSize, 
                PitchDetectionHandler handler)

 

AMDF - A pitch extractor that extracts the Average Magnitude Difference (AMDF) from an audio buffer.DYNAMIC_WAVELET - An implementation of a dynamic wavelet pitch detection algorithm (See DynamicWavelet), described in a paper by Eric Larson and Ross Maddox “Real-Time Time-Domain Pitch Tracking Using Wavelets
FFT_PITCH - Returns the frequency of the FFT-bin with most energy.
FFT_YIN - A YIN implementation with a faster FastYin for the implementation.
MPM - McLeodPitchMethod.YIN - YIN algorithm.
AudioProcessor pitchProcessor = new PitchProcessor(
		PitchProcessor.PitchEstimationAlgorithm.FFT_YIN, 
        	22050, 
        	1024, 
        	pitchDetectionHandler);
dispatcher.addAudioProcessor(pitchProcessor);
  • Thread๋กœ dispatcher๋ฅผ ์‹คํ–‰ํ•ด์ค€๋‹ค. 
Thread audioThread = new Thread(dispatcher, "Audio Thread");
audioThread.start();

 

+ Pitch to Note(ProcessPitch)

16.35~7902.13Hz๋ฅผ note๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค. ๋ณ€ํ™˜ํ‘œ์— ์žˆ๋Š” ๋‚ด์šฉ์„ ์˜ฎ๊ฒจ์™”๋‹ค. ์ƒ๊ฐ๋ณด๋‹ค ์–‘์ด ๋งŽ์•„์„œ ์ผ๋‹จ ๋”ฐ๋กœ class๋ฅผ ๋งŒ๋“ค์–ด ๋ถ„๋ฆฌํ•ด๋†“์•˜๋‹ค. ์‹ค์ œ ๊ตฌํ˜„ ๋•Œ๋Š” DB์— ๋„ฃ๊ณ  ๋ถˆ๋Ÿฌ์˜ค๋Š” ํ˜•์‹์œผ๋กœ ์ง„ํ–‰ํ•ด์•ผ ํ•  ๊ฒƒ ๊ฐ™๋‹ค. 

public class ProcessPitch {
    // pitch -> key
    public static String processPitch(float pitchInHz){
        String noteText = "Nope";
        if(pitchInHz >= 16.35 && pitchInHz < 17.32) {
            noteText = "C0";
        }
        else if(pitchInHz >= 17.32 && pitchInHz < 18.35) {
            noteText = "C#0";
        }
        else if(pitchInHz >= 18.35 && pitchInHz < 19.45) {
            noteText = "D0";
        }
        else if(pitchInHz >= 19.45 && pitchInHz < 20.60) {
            noteText = "D#0";
        }
        else if(pitchInHz >= 20.60 && pitchInHz <= 21.83) {
            noteText = "E0";
        }
        ...
        return noteText;
}

 

๋Œ“๊ธ€