Éî¶ÈѧϰÒѳɹ¦Ó¦ÓÃÓÚÕâÈý´óÁìÓò(3)
ÆäËûÖÖÀàµÄÔ¤´¦ÀíÐèҪͬʱӦÓÃÓÚѵÁ·¼¯ºÍ²âÊÔ¼¯£¬ÆäÄ¿µÄÊǽ«Ã¿¸öÑù±¾ÖÃÓÚ¸ü¹æ·¶µÄÐÎʽ£¬ÒÔ±ã¼õÉÙÄ£ÐÍÐèÒª¿¼Âǵı仯Á¿¡£¼õÉÙÊý¾ÝÖеı仯Á¿¼ÈÄܹ»¼õÉÙ·º»¯Îó²î£¬Ò²Äܹ»¼õСÄâºÏѵÁ·¼¯ËùÐèÄ£Ð͵ĴóС¡£¸ü¼òµ¥µÄÈÎÎñ¿ÉÒÔͨ¹ý¸üСµÄÄ£ÐÍÀ´½â¾ö£¬¶ø¸ü¼òµ¥µÄ½â¾ö·½°¸·º»¯ÄÜÁ¦Ò»°ã¸üºÃ¡£ÕâÖÖÀàÐ͵ÄÔ¤´¦Àíͨ³£±»Éè¼ÆΪȥ³ýÊäÈëÊý¾ÝÖеÄijÖֿɱäÐÔ£¬Õâ¶ÔÓÚÈ˹¤Éè¼ÆÕßÀ´ËµÊÇÈÝÒ×ÃèÊöµÄ£¬²¢ÇÒÈ˹¤Éè¼ÆÕßÄܹ»±£Ö¤²»Êܵ½ÈÎÎñÓ°Ïì¡£µ±Ê¹ÓôóÐÍÊý¾Ý¼¯ºÍ´óÐÍÄ£ÐÍѵÁ·Ê±£¬ÕâÖÖÔ¤´¦Àíͨ³£ÊDz»±ØÒªµÄ£¬²¢ÇÒ×îºÃÖ»ÊÇÈÃÄ£ÐÍѧϰÄÄЩ±ä»¯ÐÔÓ¦¸Ã±£Áô¡£ÀýÈ磬ÓÃÓÚ·ÖÀàImageNet µÄAlexNet ϵͳ½ö¾ßÓÐÒ»¸öÔ¤´¦Àí²½Ö裺¶Ôÿ¸öÏñËؼõȥѵÁ·Ñù±¾µÄƽ¾ùÖµ(Krizhevsky et al., 2012b)¡£ Êý¾Ý¼¯ÔöÇ¿ ÈçµÚ7.4 ½ÚÖн²µ½µÄÒ»Ñù£¬ÎÒÃǺÜÈÝÒ×ͨ¹ýÔö¼ÓѵÁ·¼¯µÄ¶îÍ⸱±¾À´Ôö¼ÓѵÁ·¼¯µÄ´óС£¬½ø¶ø¸Ä½ø·ÖÀàÆ÷µÄ·º»¯ÄÜÁ¦¡£ÕâЩ¶îÍ⸱±¾¿ÉÒÔͨ¹ý¶ÔÔʼͼÏñ½øÐÐһЩ±ä»¯À´Éú³É£¬µ«ÊDz¢²»¸Ä±äÆäÀà±ð¡£¶ÔÏóʶ±ðÕâ¸ö·ÖÀàÈÎÎñÌرðÊʺÏÓÚÕâÖÖÐÎʽµÄÊý¾Ý¼¯ÔöÇ¿£¬ÒòΪÀà±ðÐÅÏ¢¶ÔÓÚÐí¶à±ä»»ÊDz»±äµÄ£¬¶øÎÒÃÇ¿ÉÒÔ¼òµ¥µØ¶ÔÊäÈëÓ¦ÓÃÖî¶à¼¸ºÎ±ä»»¡£ÈçÇ°ËùÊö£¬·ÖÀàÆ÷¿ÉÒÔÊÜÒæÓÚËæ»úת»»»òÕßÐýת£¬Ä³Ð©Çé¿öÏÂÊäÈëµÄ·×ª¿ÉÒÔÔöÇ¿Êý¾Ý¼¯¡£ÔÚרÃŵļÆËã»úÊÓ¾õÓ¦ÓÃÖУ¬´æÔںܶà¸ü¸ß¼¶µÄÓÃÒÔÔöÇ¿Êý¾Ý¼¯µÄ±ä»»¡£ÕâЩ·½°¸°üÀ¨Í¼ÏñÖÐÑÕÉ«µÄËæ»úÈŶ¯(Krizhevskyet al., 2012b)£¬ÒÔ¼°¶ÔÊäÈëµÄ·ÇÏßÐÔ¼¸ºÎ±äÐÎ(LeCun et al., 1998c)¡£ ÓïÒôʶ±ð ÓïÒôʶ±ðÈÎÎñÊǽ«Ò»¶Î°üÀ¨ÁË×ÔÈ»ÓïÑÔ·¢ÒôµÄÉùѧÐźÅͶӰµ½¶ÔӦ˵»°È˵ĴÊÐòÁÐÉÏ¡£ÁîX = (x(1), x(2), …, x(T)) ±íʾÓïÒôµÄÊäÈëÏòÁ¿(´«Í³×ö·¨ÒÔ20ms Ϊһ֡·Ö¸îÐźÅ)¡£Ðí¶àÓïÒôʶ±ðµÄϵͳͨ¹ýÌØÊâµÄÊÖ¹¤Éè¼Æ·½·¨Ô¤´¦ÀíÊäÈëÐźţ¬´Ó¶øÌáÈ¡ÌØÕ÷£¬µ«ÊÇijЩÉî¶Èѧϰϵͳ(Jaitly and Hinton, 2011) Ö±½Ó´ÓÔʼÊäÈëÖÐѧϰÌØÕ÷¡£Áîy = (y1; y2,…, yN) ±íʾĿ±êµÄÊä³öÐòÁÐ(ͨ³£ÊÇÒ»¸ö´Ê»òÕß×Ö·ûµÄÐòÁÐ)¡£×Ô¶¯ÓïÒôʶ±ð(automatic speech recognition,ASR) ÈÎÎñÖ¸µÄÊǹ¹ÔìÒ»¸öº¯Êýf*ASR£¬Ê¹µÃËüÄܹ»ÔÚ¸ø¶¨ÉùѧÐòÁÐX µÄÇé¿öϼÆËã×îÓпÉÄܵÄÓïÑÔÐòÁÐy£º ÆäÖÐP*ÊǸø¶¨ÊäÈëÖµX ʱ¶ÔӦĿ±êy µÄÕæʵÌõ¼þ·Ö²¼¡£ ´Ó20 ÊÀ¼Í80 Äê´úÖ±µ½2009»2012 Ä꣬×îÏȽøµÄÓïÒôʶ±ðϵͳÊÇÒþÂí¶û¿É·òÄ£ÐÍ(hiddenmarkov model, HMM) ºÍ¸ß˹»ìºÏÄ£ÐÍ(gaussian mixture model, GMM) µÄ½áºÏ¡£GMM ¶ÔÉùѧÌØÕ÷ºÍÒôËØ(phoneme) Ö®¼äµÄ¹Øϵ½¨Ä£(Bahl et al., 1987)£¬HMM ¶ÔÒôËØÐòÁн¨Ä£¡£GMM-HMM Ä£Ðͽ«ÓïÒôÐźÅÊÓ×÷ÓÉÈçϹý³ÌÉú³É£ºÊ×ÏÈ£¬Ò»¸öHMM Éú³ÉÁËÒ»¸öÒôËصÄÐòÁÐÒÔ¼°ÀëÉ¢µÄ×ÓÒôËØ״̬(±ÈÈçÿһ¸öÒôËصĿªÊ¼¡¢Öм䡢½áβ)£¬È»ºóGMM °Ñÿһ¸öÀëÉ¢µÄ״̬ת»¯ÎªÒ»¸ö¼ò¶ÌµÄÉùÒôÐźš£¾¡¹ÜÖ±µ½×î½üGMM-HMM Ò»Ö±ÔÚASR ÖÐÕ¼¾ÝÖ÷µ¼µØ룬ÓïÒôʶ±ðÈÔÈ»ÊÇÉñ¾ÍøÂçËù³É¹¦Ó¦ÓõĵÚÒ»¸öÁìÓò¡£´Ó20 ÊÀ¼Í80 Äê´úÄ©ÆÚµ½90 Äê´ú³õÆÚ£¬´óÁ¿ÓïÒôʶ±ðϵͳʹÓÃÁËÉñ¾ÍøÂç(Bourlard and Wellekens, 1989; Waibel et al., 1989; Robinsonand Fallside, 1991; Bengio et al., 1991, 1992; Konig et al., 1996)¡£µ±Ê±£¬»ùÓÚÉñ¾ÍøÂçµÄASRµÄ±íÏÖºÍGMM-HMM ϵͳµÄ±íÏֲ¶à¡£±ÈÈç˵£¬Robinson and Fallside (1991) ÔÚTIMITÊý¾Ý¼¯(Garofolo et al., 1993)(ÓÐ39 ¸öÇø·ÖµÄÒôËØ) ÉÏ´ïµ½ÁË26% µÄÒôËØ´íÎóÂÊ£¬Õâ¸ö½á¹ûÓÅÓÚ»òÕß˵ÊÇ¿ÉÒÔÓë»ùÓÚHMM µÄ½á¹ûÏà±È¡£´ÓÄÇʱÆð£¬TIMIT ³ÉΪÒôËØʶ±ðµÄÒ»¸ö»ù×¼Êý¾Ý¼¯£¬ÔÚÓïÒôʶ±ðÖеÄ×÷ÓþͺÍMNIST ÔÚ¶ÔÏóʶ±ðÖеÄ×÷Óò¶à¡£È»¶ø£¬ÓÉÓÚÓïÒôʶ±ðÈí¼þϵͳÖи´ÔӵŤ³ÌÒòËØÒÔ¼°ÔÚ»ùÓÚGMM-HMM µÄϵͳÖÐÒѾ¸¶³öµÄ¾Þ´óŬÁ¦£¬¹¤Òµ½ç²¢Ã»ÓÐÆÈÇÐתÏòÉñ¾ÍøÂçµÄÐèÇó¡£½á¹û£¬Ö±µ½21 ÊÀ¼Í00 Äê´úÄ©ÆÚ£¬Ñ§Êõ½çºÍ¹¤Òµ½çµÄÑо¿ÕßÃǸü¶àµÄÊÇÓÃÉñ¾ÍøÂçΪGMM-HMM ϵͳѧϰһЩ¶îÍâµÄÌØÕ÷¡£ Ö®ºó£¬Ëæןü´ó¸üÉîµÄÄ£ÐÍÒÔ¼°¸ü´óµÄÊý¾Ý¼¯µÄ³öÏÖ£¬Í¨¹ýʹÓÃÉñ¾ÍøÂç´úÌæGMM À´ÊµÏÖ½«ÉùѧÌØÕ÷ת»¯ÎªÒôËØ(»òÕß×ÓÒôËØ״̬) µÄ¹ý³Ì¿ÉÒÔ´ó´óµØÌá¸ßʶ±ðµÄ¾«¶È¡£´Ó2009Ä꿪ʼ£¬ÓïÒôʶ±ðµÄÑо¿ÕßÃǽ«Ò»ÖÖÎ޼ලѧϰµÄÉî¶Èѧϰ·½·¨Ó¦ÓÃÓÚÓïÒôʶ±ð¡£ÕâÖÖÉî¶Èѧϰ·½·¨»ùÓÚѵÁ·Ò»¸ö±»³Æ×÷ÊÇÊÜÏÞ²£¶û×ÈÂü»úµÄÎÞÏò¸ÅÂÊÄ£ÐÍ£¬´Ó¶ø¶ÔÊäÈëÊý¾Ý½¨Ä£¡£ÊÜÏÞ²£¶û×ÈÂü»ú½«»áÔÚµÚÈý²¿·ÖÖÐÃèÊö¡£ÎªÁËÍê³ÉÓïÒôʶ±ðÈÎÎñ£¬Î޼ලµÄԤѵÁ·±»ÓÃÀ´¹¹ÔìÒ»¸öÉî¶ÈÇ°À¡ÍøÂ磬Õâ¸öÉñ¾ÍøÂçÿһ²ã¶¼ÊÇͨ¹ýѵÁ·ÊÜÏÞ²£¶û×ÈÂü»úÀ´³õʼ»¯µÄ¡£ÕâЩÍøÂçµÄÊäÈëÊÇ´ÓÒ»¸ö¹Ì¶¨¹æ¸ñµÄÊäÈë´°(ÒÔµ±Ç°Ö¡ÎªÖÐÐÄ) µÄÆ×Éùѧ±íʾ³éÈ¡£¬Ô¤²âÁ˵±Ç°Ö¡Ëù¶ÔÓ¦µÄHMM ״̬µÄÌõ¼þ¸ÅÂÊ¡£ÑµÁ·Ò»¸öÕâÑùµÄÉñ¾ÍøÂçÄܹ»¿ÉÒÔÏÔÖøÌá¸ßÔÚTIMIT Êý¾Ý¼¯ÉϵÄʶ±ðÂÊ(Mohamed et al., 2009,2012a)£¬²¢½«ÒôËؼ¶±ðµÄ´íÎóÂÊ´Ó´óÔ¼26% ½µµ½ÁË20:7%¡£¹ØÓÚÕâ¸öÄ£Ðͳɹ¦ÔÒòµÄÏêϸ·ÖÎö¿ÉÒԲο¼Mohamed et al. (2012b)¡£¶ÔÓÚ»ù±¾µÄµç»°Ê¶±ð¹¤×÷Á÷³ÌµÄÒ»¸öÀ©Õ¹¹¤×÷ÊÇÌí¼Ó˵»°ÈË×ÔÊÊÓ¦Ïà¹ØÌØÕ÷(Mohamed et al., 2011) µÄ·½·¨£¬Õâ¿ÉÒÔ½øÒ»²½µØ½µµÍ´íÎóÂÊ¡£½ô½Ó×ŵŤ×÷Ôò½«½á¹¹´ÓÒôËØʶ±ð(TIMIT ËùÖ÷Òª¹Ø×¢µÄ)תÏòÁË´ó¹æÄ£´Ê»ãÓïÒôʶ±ð(Dahl et al., 2012)£¬Õâ²»½ö°üº¬ÁËʶ±ðÒôËØ£¬»¹°üÀ¨ÁËʶ±ð´ó¹æÄ£´Ê»ãµÄÐòÁС£ÓïÒôʶ±ðÉϵÄÉî¶ÈÍøÂç´Ó×î³õµÄʹÓÃÊÜÏÞ²£¶û×ÈÂü»ú½øÐÐԤѵÁ··¢Õ¹µ½ÁËʹÓÃÖîÈçÕûÁ÷ÏßÐÔµ¥ÔªºÍDropout ÕâÑùµÄ¼¼Êõ(Zeiler et al., 2013; Dahl et al., 2013)¡£´ÓÄÇʱ¿ªÊ¼£¬¹¤Òµ½çµÄ¼¸¸öÓïÒôÑо¿×鿪ʼѰÇóÓëѧÊõȦµÄÑо¿ÕßÖ®¼äµÄºÏ×÷¡£Hinton et al. (2012a)ÃèÊöÁËÕâЩºÏ×÷Ëù´øÀ´µÄÍ»ÆÆÐÔ½øÕ¹£¬ÕâЩ¼¼ÊõÏÖÔÚ±»¹ã·ºÓ¦ÓÃÔÚ²úÆ·ÖУ¬±ÈÈçÒƶ¯ÊÖ»ú¶Ë¡£ Ëæºó£¬µ±Ñо¿×éʹÓÃÁËÔ½À´Ô½´óµÄ´ø±êÇ©µÄÊý¾Ý¼¯£¬¼ÓÈëÁ˸÷ÖÖ³õʼ»¯¡¢ÑµÁ··½·¨ÒÔ¼°µ÷ÊÔÉî¶ÈÉñ¾ÍøÂçµÄ½á¹¹Ö®ºó£¬ËûÃÇ·¢ÏÖÕâÖÖÎ޼ලµÄԤѵÁ··½Ê½ÊÇûÓбØÒªµÄ£¬»òÕß˵²»ÄÜ´øÀ´ÈκÎÏÔÖøµÄ¸Ä½ø¡£ ÓÃÓïÒôʶ±ðÖдʴíÎóÂÊÀ´ºâÁ¿£¬ÔÚÓïÒôʶ±ðÐÔÄÜÉϵÄÕâЩͻÆÆÊÇÊ·ÎÞÇ°ÀýµÄ(´óÔ¼30%µÄÌá¸ß)¡£ÔÚÕâ֮ǰµÄ³¤´ïÊ®Äê×óÓÒµÄʱ¼äÄÚ£¬¾¡¹ÜÊý¾Ý¼¯µÄ¹æÄ£ÊÇËæʱ¼äÔö³¤µÄ(¼ûDeng and Yu (2014) µÄͼ2.4)£¬µ«»ùÓÚGMM-HMM µÄϵͳµÄ´«Í³¼¼ÊõÒѾͣÖͲ»Ç°ÁË¡£ÕâÒ²µ¼ÖÂÁËÓïÒôʶ±ðÁìÓò¿ìËÙµØתÏòÉî¶ÈѧϰµÄÑо¿¡£ÔÚ´óÔ¼Á½ÄêµÄʱ¼äÄÚ£¬¹¤Òµ½ç´ó¶àÊýµÄÓïÒôʶ±ð²úÆ·¶¼°üº¬ÁËÉî¶ÈÉñ¾ÍøÂ磬ÕâÖֳɹ¦Ò²¼¤·¢ÁËASR ÁìÓò¶ÔÉî¶ÈѧϰËã·¨ºÍ½á¹¹µÄÐÂÒ»²¨Ñо¿À˳±£¬²¢ÇÒÓ°ÏìÖÁ½ñ¡£ £¨±à¼£ºASPÕ¾³¤Íø£© |