<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://sparkjiao.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://sparkjiao.github.io/" rel="alternate" type="text/html" /><updated>2026-03-22T01:55:39-07:00</updated><id>https://sparkjiao.github.io/feed.xml</id><title type="html">Fangkai Jiao</title><subtitle>继续走八千里路 三十功名尘与土</subtitle><author><name>Fangkai Jiao</name><email>jiaofangkai [AT] hotmail [dot] com</email></author><entry><title type="html">Future Blog Post</title><link href="https://sparkjiao.github.io/posts/2012/08/blog-post-4/" rel="alternate" type="text/html" title="Future Blog Post" /><published>2199-01-01T00:00:00-08:00</published><updated>2199-01-01T00:00:00-08:00</updated><id>https://sparkjiao.github.io/posts/2012/08/future-post</id><content type="html" xml:base="https://sparkjiao.github.io/posts/2012/08/blog-post-4/"><![CDATA[<p>This post will show up by default. To disable scheduling of future posts, edit <code class="language-plaintext highlighter-rouge">config.yml</code> and set <code class="language-plaintext highlighter-rouge">future: false</code>.</p>]]></content><author><name>Fangkai Jiao</name><email>jiaofangkai [AT] hotmail [dot] com</email></author><category term="cool posts" /><category term="category1" /><category term="category2" /><summary type="html"><![CDATA[This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.]]></summary></entry><entry><title type="html">8.22 MSR Talk &amp;amp; 组会笔记</title><link href="https://sparkjiao.github.io/posts/2019/coai-group-dicussion-0822/" rel="alternate" type="text/html" title="8.22 MSR Talk &amp;amp; 组会笔记" /><published>2019-08-22T00:00:00-07:00</published><updated>2019-08-22T00:00:00-07:00</updated><id>https://sparkjiao.github.io/posts/2019/Talk%20&amp;%20%E7%BB%84%E4%BC%9A</id><content type="html" xml:base="https://sparkjiao.github.io/posts/2019/coai-group-dicussion-0822/"><![CDATA[<h2 id="microsoft-research-talk">Microsoft Research Talk</h2>

<h3 id="machine-reading-for-precision-medicine"><strong>Machine Reading for Precision Medicine</strong></h3>

<p><strong>Annotation Bottleneck</strong></p>

<p><strong>Document-Level N-array Relation Extraction</strong></p>

<h4 id="self-supervised-learning"><strong>Self-supervised Learning</strong></h4>

<p>Knowledge Base</p>

<p>Distant supervision: Relation from database/knowledge graph as noisy labels.</p>

<p>Knowledge -&gt; Probabilistic Logic -&gt; Distant Supervised Label -&gt; Deep Learning</p>

<h4 id="neural-architecture">Neural Architecture</h4>

<p>Graph LSTM</p>

<p>Graph -&gt; Exploit rich linguistic structures</p>

<h4 id="multi-scale-representation-learning">Multi-scale Representation Learning</h4>

<p><strong>for Document-Level relation extraction</strong></p>

<p>Input Text -&gt; Mention-level representations -&gt; Entity-level representation -&gt; final predictions</p>

<p>“Document-level n-ary relation extraction with Multi-scale representation learning” - NAACL-19</p>

<h4 id="application">Application</h4>

<p>Machine reading for assisted curation. curator</p>

<ul>
  <li>molecular tumor board</li>
  <li>EMR: 60%-80% in unstructured text</li>
  <li></li>
</ul>

<h4 id="some-discussion">Some discussion</h4>

<p>BERT is too general. Prior knowledge is very useful and may be added during training BERT.</p>

<h2 id="coai-lab-disscussion">CoAI Lab Disscussion</h2>

<h3 id="pre-training-language-model"><strong>Pre-training Language Model</strong></h3>

<p><strong>Auto-regressive language model behave as well as non-autoregressive auto-encoder in pre-training methods.</strong></p>]]></content><author><name>Fangkai Jiao</name><email>jiaofangkai [AT] hotmail [dot] com</email></author><category term="notes" /><category term="talk" /><summary type="html"><![CDATA[Microsoft Research Talk]]></summary></entry><entry><title type="html">6.27组会笔记</title><link href="https://sparkjiao.github.io/posts/2019/coai-group-dicussion-0627/" rel="alternate" type="text/html" title="6.27组会笔记" /><published>2019-06-27T00:00:00-07:00</published><updated>2019-06-27T00:00:00-07:00</updated><id>https://sparkjiao.github.io/posts/2019/%E7%BB%84%E4%BC%9A%E7%AC%94%E8%AE%B0</id><content type="html" xml:base="https://sparkjiao.github.io/posts/2019/coai-group-dicussion-0627/"><![CDATA[<h4 id="persuasive-communication">Persuasive Communication</h4>

<h4 id="xlnet">XLNet</h4>]]></content><author><name>Fangkai Jiao</name><email>jiaofangkai [AT] hotmail [dot] com</email></author><category term="notes" /><summary type="html"><![CDATA[Persuasive Communication]]></summary></entry><entry><title type="html">5.16组会笔记</title><link href="https://sparkjiao.github.io/posts/2019/coai-group-dicussion-0516/" rel="alternate" type="text/html" title="5.16组会笔记" /><published>2019-05-16T00:00:00-07:00</published><updated>2019-05-16T00:00:00-07:00</updated><id>https://sparkjiao.github.io/posts/2019/%E7%BB%84%E4%BC%9A%E7%AC%94%E8%AE%B0</id><content type="html" xml:base="https://sparkjiao.github.io/posts/2019/coai-group-dicussion-0516/"><![CDATA[<h3 id="weak-supervision"><strong>Weak Supervision</strong></h3>

<p>熵 -&gt; uniform distribution
\(-\sum^i_m q_i \, log \, d_i  ??\)
Self-Training/Self-Supervised</p>

<p>Relieve Bias ?</p>

<p>EM optimization(similarity based loss + MIL/hinge loss/VAE)</p>

<p>Get higher-level supervision over unlabeled data from SMEs:</p>

<p>启发式规则</p>

<p>Distant Supervision, Constraints, Expected Distribution, Invariances.</p>

<h3 id="external-knowledge"><strong>External Knowledge</strong></h3>

<h4 id="label-free-emnlp-2018">Label-free EMNLP 2018</h4>

<p>Based on TransE pretrianed embedding, start entity + sentence embedding(predicted relation vector from neural model) == end entity</p>]]></content><author><name>Fangkai Jiao</name><email>jiaofangkai [AT] hotmail [dot] com</email></author><category term="notes" /><summary type="html"><![CDATA[Weak Supervision]]></summary></entry><entry><title type="html">5.9组会笔记</title><link href="https://sparkjiao.github.io/posts/2019/coai-group-dicussion-0509/" rel="alternate" type="text/html" title="5.9组会笔记" /><published>2019-05-09T00:00:00-07:00</published><updated>2019-05-09T00:00:00-07:00</updated><id>https://sparkjiao.github.io/posts/2019/%E7%BB%84%E4%BC%9A%E7%AC%94%E8%AE%B0</id><content type="html" xml:base="https://sparkjiao.github.io/posts/2019/coai-group-dicussion-0509/"><![CDATA[<h3 id="multi-turn-chatbots"><strong>Multi-Turn Chatbots</strong></h3>

<p><strong>Context Reasoning</strong></p>

<p>Multi-Memory(for each turn, or for each person in a <strong>chat group</strong>)</p>

<p>Reinforcement Learning</p>

<p>For structured knowledge or unstructured knowledge</p>]]></content><author><name>Fangkai Jiao</name><email>jiaofangkai [AT] hotmail [dot] com</email></author><category term="notes" /><summary type="html"><![CDATA[Multi-Turn Chatbots]]></summary></entry><entry><title type="html">4.27 CIPS SMP Notes</title><link href="https://sparkjiao.github.io/posts/2019/0427/cips-smp-notes/" rel="alternate" type="text/html" title="4.27 CIPS SMP Notes" /><published>2019-04-27T00:00:00-07:00</published><updated>2019-04-27T00:00:00-07:00</updated><id>https://sparkjiao.github.io/posts/2019/0427/cips-smp-note</id><content type="html" xml:base="https://sparkjiao.github.io/posts/2019/0427/cips-smp-notes/"><![CDATA[<p>4.27 中国中文信息学会 社会媒体处理（SMP）专委会 首届“社交机器人”论坛</p>

<h2 id="pretraining-in-nlp-and-cv">Pretraining in NLP and CV</h2>

<h3 id="ulmfit">ULMFit</h3>
<p>double fine-tune</p>

<h3 id="cove">CoVe</h3>
<p>Supervised Language Model</p>

<h3 id="ms-knowledgebert-for-semantic-parsing">MS: KnowledgeBERT for Semantic Parsing</h3>
<p>MSPars: A Multi-Perspective Semantic Parsing Dataset for Knowledge-based Question Answering</p>

<h3 id="pre-training-based-natural-language-generation">Pre-training-based Natural Language Generation</h3>
<p>BERT-output -&gt; draft(Text Summarization) -&gt; Transformer Decoder -&gt; output</p>

<h3 id="pre-training-in-imagenetmulti-modal">Pre-training in ImageNet/Multi-Modal</h3>
<p>ImageBERT
Image object + text -&gt; BERT</p>

<h3 id="videobertgoogle">VideoBERT(Google)</h3>
<p>videos -&gt; frames -&gt; token -&gt; BERT</p>

<h3 id="application-video-qa-summarization-and-chat">Application: Video QA, Summarization and Chat.</h3>
<p>Video span like machine reading comprehension.</p>

<h2 id="nlu-in-task-oriented-dialog-system">NLU in Task-Oriented Dialog System</h2>
<p>intent detection, slot filling, state tracking.</p>

<h3 id="intent-detection">Intent Detection</h3>
<h3 id="slot-filling">Slot Filling</h3>
<h3 id="dialog-management">Dialog Management</h3>
<p>Reinforcement Learning</p>

<h3 id="dialog-generation">Dialog Generation</h3>
<p>based on Pattern/LM/Seq2Seq</p>

<h3 id="对话技术平台">对话技术平台</h3>
<p>希望中小型开发者只需要上传自己的数据，即可开发自己的对话系统。</p>

<p>哈工大聊天机器人“笨笨”/任务型对话系统</p>

<h3 id="few-shot-learning">Few-shot Learning</h3>
<h4 id="小样本的训练">小样本的训练</h4>
<p>标注数据的自动扩充 Seq2Seq/Pre-training</p>

<p>How to do few-shot learning on sequence labeling(slot filling) task?</p>

<h3 id="draw-back-of-traditional-dialong-system">Draw-back of traditional dialong system</h3>
<p>Depend on previous dialog.</p>

<p>Joing Training: Intent detection + Slot filling based on Stack Propagation(important) and Multi-Task</p>

<h3 id="evaluation-of-dialog-system-techniques">Evaluation of Dialog System Techniques</h3>
<p>ECDT2017-2018</p>

<h2 id="alibaba">Alibaba</h2>

<p>Asememble Learning
Hybrid CNN <br />
冷启动快速端到端测试 Deep reinforcement Learning</p>

<h3 id="mrc-for-unstructured-data">MRC for Unstructured Data</h3>
<p>based on SLQA -&gt; EMNLP</p>

<h2 id="open-domain-non-oriented-dialog-system">Open-Domain Non-Oriented Dialog System</h2>

<h3 id="retrieval-based-chatbot">Retrieval-Based Chatbot</h3>

<h3 id="multi-view-relevence-interestingness-informativeness">Multi-View: Relevence, Interestingness, Informativeness,</h3>

<h3 id="non-sentential-utterance-resolution">Non-Sentential Utterance Resolution</h3>

<h3 id="retrieval-from-non-dialogue-corpus">Retrieval from Non-Dialogue Corpus</h3>
<ul>
  <li>和阅读理解有什么区别？ 专业知识可能会粒度特别的细，所以需要开放域（大概是这个意思）</li>
</ul>

<h3 id="neural-responce-generation">Neural Responce Generation</h3>
<ul>
  <li>The “Bland Response” Problem: I dont’t know/Well/Great/Fine, Jiwei Li
    <ul>
      <li>Adversarial Training</li>
      <li>discriminator生成的是比较细粒度的东西，用一个单一的score作为reward去回传会不会有问题，所以把？？？？和embedding直接乘起来， 剩下的没记下来orz 好像组会讲过XD</li>
    </ul>
  </li>
  <li>The “Myopia Problem” of Beam Search</li>
</ul>

<h3 id="child-friendly-social-chatbot">Child Friendly Social Chatbot</h3>
<ul>
  <li>避免一些对小孩不合适的话题，谈恋爱，结婚生子等</li>
  <li>先用一个用儿童语料库训练的语言模型去过滤</li>
  <li>生成模型本身比较保守，双重过滤后概率会远小于真实世界的概率</li>
</ul>

<h2 id="qa">Q&amp;A</h2>

<h3 id="constraint-on-dialog-generation">Constraint on Dialog Generation</h3>]]></content><author><name>Fangkai Jiao</name><email>jiaofangkai [AT] hotmail [dot] com</email></author><category term="notes" /><summary type="html"><![CDATA[4.27 中国中文信息学会 社会媒体处理（SMP）专委会 首届“社交机器人”论坛]]></summary></entry><entry><title type="html">4.25组会笔记</title><link href="https://sparkjiao.github.io/posts/2019/coai-group-dicussion-0425/" rel="alternate" type="text/html" title="4.25组会笔记" /><published>2019-04-25T00:00:00-07:00</published><updated>2019-04-25T00:00:00-07:00</updated><id>https://sparkjiao.github.io/posts/2019/%E7%BB%84%E4%BC%9A%E7%AC%94%E8%AE%B0</id><content type="html" xml:base="https://sparkjiao.github.io/posts/2019/coai-group-dicussion-0425/"><![CDATA[<h2 id="425大组会">4.25大组会</h2>

<p>对话中的隐变量 -&gt;
action spaces？</p>

<p>策略网络？</p>

<p>隐变量能不能用于QA中？</p>

<p>任务导向场景下的隐变量学习？</p>

<p>为什么不能直接预测一个结果而要通过隐变量的形式？ 有监督好训练？</p>

<p>Open Domain中的应用？</p>

<p>Graph Attention, Graph Attention Transformer， Graph Attenton Network.</p>

<p>Relational GAN</p>]]></content><author><name>Fangkai Jiao</name><email>jiaofangkai [AT] hotmail [dot] com</email></author><category term="notes" /><summary type="html"><![CDATA[4.25大组会]]></summary></entry></feed>