您好,登錄后才能下訂單哦!
這期內(nèi)容當中小編將會給大家?guī)碛嘘P(guān)如何用代碼實現(xiàn)RNN文本生成模型,文章內(nèi)容豐富且以專業(yè)的角度為大家分析和敘述,閱讀完這篇文章希望大家可以有所收獲。
文本生成(generating text)對機器學習和NLP初學者來說似乎很有趣的項目之一,但也是一個非常困難的項目。值得慶幸的是,網(wǎng)絡上有各種各樣的優(yōu)秀資源,可以用于了解RNN如何用于文本生成,從理論到深入具體的技術(shù),都有一些非常好的資源。所有的這些資源都會特別分享一件事情:在文本生成過程中的某個時候,你必須建立RNN模型并調(diào)參來完成這項工作。
雖然文本生成是一項有價值的工作,特別是在學習的該過程中,但如果任務抽象程度高,應該怎么辦呢?如果你是一個數(shù)據(jù)科學家,需要一個RNN文本生成器形式的模塊來填充項目呢?或者作為一個新人,你只是想試試或者提升下自己。對于這兩種情況,都可以來看看textgenrnn項目,它用幾行代碼就能夠輕松地在任何文本數(shù)據(jù)集上訓練任意大小和復雜的文本生成神經(jīng)網(wǎng)絡。 textgenrnn項目由數(shù)據(jù)科學家Max Woolf開發(fā)而成。
textgenrnn是建立在Keras和TensorFlow之上的,可用于生成字符和文字級文本。網(wǎng)絡體系結(jié)構(gòu)使用注意力加權(quán)來加速訓練過程并提高質(zhì)量,并允許調(diào)整大量超參數(shù),如RNN模型大小、RNN層和雙向RNN。讀者可以在Github上或類似的介紹博客文章中閱讀有關(guān)textgenrnn及其功能和體系結(jié)構(gòu)的更多信息。
本文爬取2014年1月1日至2018年6月11日特朗普的推文,其中包括美國總統(tǒng)就職前后的推文(來自特朗普Twitter Archive)。從中只選擇日期范圍內(nèi)的推文來獲取文本,并將其保存到一個文本文件中,將該文本命名為trump-tweets.txt。
下面讓我們看看用textgenrnn生成文本的簡單方法。以下4行是我們需要導入的庫,并創(chuàng)建文本生成對象,在trump-tweets.txt文件中訓練模型10個epoch,然后生成一些示例推文。
from textgenrnn import textgenrnn textgen = textgenrnn() textgen.train_from_file('trump-tweets.txt', num_epochs=10) textgen.generate(5)
大約30分鐘后,會產(chǎn)生以下結(jié)果(訓練時間與電腦性能有關(guān),在第10個epoch):
My @FoxNews will be self finally complaining about me that so he is a great day and companies and is starting to report the president in safety and more than any mention of the bail of the underaches to the construction and freedom and efforts the politicians and expensive meetings should have bee The world will be interviewed on @foxandfriends at 7:30pm. Enjoy! .@JebBush and Fake News Media is a major place in the White House in the service and sense where the people of the debate and his show of many people who is a great press considering the GREAT job on the way to the U.S. A the best and people in the biggest! Thank you! New Hampshire Trump Int'l Hotel Leadership Barrier Lou Clinton is a forever person politically record supporters have really beginning in the media on the heart of the bad and women who have been succeeded and before you can also work the people are there a time strong and send out the world with Join me in Maryland at 7:00 A.M. and happened to the WALL and be true the longer of the same sign into the Fake News Media will be a great honor to serve that the Republican Party will be a great legal rate the media with the Best Republican Party and the American people that will be the bill by a...
拋開政治而言,考慮到我們僅僅在10個epoch內(nèi)使用約12千條推文進行訓練,這些生成的推文并不是很糟糕的。textgenrnn默認設置為0.5,如果想獲得更多的創(chuàng)意推文可以將該值調(diào)高,下面讓我們試試看調(diào)高該值后效果如何:
textgen.generate(5, temperature=0.9)
“Via-can see this Democrats were the opening at GREAT ENSUS CALL! .@GovSeptorald Taster is got to that the subcent Vote waiting them. @Calkers Major President Obama will listen for the disaster! Grateful and South Carolina so his real ability and much better-- or big crisis on many signing!It is absolutely dumbers for well tonight. Love us in the great inherition of fast. With bill of badly to forget the greatest puppet at my wedds. No Turnberry is "bigger.” - All
上述結(jié)果不太具有說服力,那么將textgenrnn值調(diào)低效果又如何呢?結(jié)果表明模型更穩(wěn)定:
textgen.generate(5, temperature=0.1)
The Fake News Media is a great people of the president was a great people of the many people who would be a great people of the president was a big crowd of the statement of the media is a great people of the people of the statement of the people of the people of the world with the statement of th Thank you @TrumpTowerNY #Trump2016 https://t.co/25551R58350Thank you for your support! #Trump2016 https://t.co/7eN53P55cThe people of the U.S. has been a great people of the presidential country is a great time and the best thing that the people of the statement of the media is the people of the state of the best thing that the people of the statement of the statement of the problem in the problem and success and t Thank you @TheBrodyFile tonight at 8:00 A.M. Enjoy!
根據(jù)兩個例子的對比,可以對這個項目有更清晰的了解。
上述就是小編為大家分享的如何用代碼實現(xiàn)RNN文本生成模型了,如果剛好有類似的疑惑,不妨參照上述分析進行理解。如果想知道更多相關(guān)知識,歡迎關(guān)注億速云行業(yè)資訊頻道。
免責聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。