Comprehensive Survey on Community Question Answering
First Author#1, Second
Author*2, Third Author#3
#First-Third Department, First-Third University
Address Including Country Name
Address Including Country Name
Abstract—Web search engines give a ranked list of related
documents based on users keywords which depends on various aspects like
popularity measures, keyword match, frequency of accessing documents in which
users have to check every specific document for getting the desired information
and it cause information retrieval a time consuming process. Community Question
Answering(CQA) system focus to deliver users short and precise answers instead
of irrelevant documents. CQA is a specialized application which deals with
information retrieval which has an ability to retrieve the right answers to
question posed in natural language. Natural Language Processing (NLP)
techniques used to process a question, then searches for the required
information regarding user questions to determine the answer accurately.
Keywords—Community Question Answering system, Information Retrieval, Natural Langauge Processing.
In recent years, large amount of memory
is employed by the historical web pages to retrieve the vital information, from
those pages mainly in the blogs such as traditional Frequently Asked Questions
(FAQ) archives and the emerging Community Question Answering (CQA) services,
such as Yahoo! Answers, Live QnA, and Baidu Zhidao. The web content of these
web sites is usually organized as questions and the answers associated with
metadata from which the requesting users categorize the questions and the
respondents reply the best answers. This results in CQA archives to have
valuable resources for various tasks like question-answering and knowledge
mining, etc.. One fundamental task for reusing the contents in CQA is to find
similar questions for newly queried questions, as questions are the keys to
accessing the knowledge in CQA. Then, the best answers to those similar
questions used to answer the queried questions, results in Question retrieval.
Question answering system
QUESTION ANSWERING TYPES
Open-domain QAS is deals
with questions of nearly everything and Closed-domain QAS deals with questions
in a specific domain.
Data Source classfication:
Structured data deals with
relational database and Unstructured data deals with documents / webpages in
Extracted answer lies
directly in the database and Generated
answer needs to be generated or formulated from the retrieved data.
GENERAL ARCHITECTURE OF QUESTION ANSWERING SYSTEM
2.1.1 Question Classification
In Question processing the
system first should analyse the type of question.Table 1 shows question words,
type of questions and answers. Questions
can be classified into two categories
question with ‘WH’ question
words such as what, where, who, whom, which, how, why and etc. and questions
with ‘modal’ or ‘auxiliary’ verbs that their answers are Yes/No.
1 Question classification and answer
related work on community question answering
Bernhard D., and Gurevych I. (2009)1: It is evaluated with three datasets for training statistical word
translation models for use in answer finding:question-answer pairs,
manually-tagged question reformulations and glosses for the same term extracted
from several lexical semantic resources. In order to integrate semantics in
retrieval, it is therefore advisable to combine both knowledge specific to the
task at hand, e.g. question-answer pairs, and external knowledge, as contained
in lexical semantic resources. The existing system lacks in question analysis
by automatically identifying question topic and question focus.
Guangyou Zhou, Zhiwen Xie, Tingting He. (2016)4: The State-of-the-art approaches address these issues by implicitly
expanding the queried questions with additional words or phrases using
monolingual translation models. The task of question retrieval in CQA and represent
a question as a Bag-of-Embedded-Words (BoEW) in a continuous space. The
existing system lacks in pairs to learn various translation models to bridge
the lexical gap problem.
Wei-Nan Zhang, Zhao Yan Ming, Yu Zhang. (2016)10: It Explore the key concept identification approach for query
refinement and a pivot language translation based approach to explore key
concept paraphrasing. These word embedding models contribute the most the
performance. The existing system generates noise samples for each input word to
estimate the target word causes inefficiency.
Qiu X., Huang X. (2015)7: The convolutional neural tensor network architecture to encode the
sentences in semantic space and model their interactions with a tensor layer
and also help to learn better word embedding’s. The existing system lacks in to
efficiently detect local reuses at the semantic level for large scale problems.
Zhou G., He T., Zhao J., Hu P. (2015)15: The framework of fisher kernel to aggregated them into the fixed
length vectors. That metadata of category information benefits the word
embedding learning for question representation. The existing system have
problem from different aspects such as extraction methods with or without
Zhang K., Wu W., Wu H., Li Z., Zhou M. (2014) 12: They are heterogeneous for both the literal level and user
behaviors. Conduct a series of experiments to evaluate our proposed approaches
automatically on large-scale data sets. The existing system cannot be directly
used for large scale problems
Zhou G., Chen Y., Zeng D., and Zhao J. (2013) 14: A novel Question-Answer Topic Model (QATM) to learn the latent
topics aligned across the question-answer pairs to alleviate the lexical gap
problem. A faster and better retrieval model for question search by leveraging
user chosen category. The existing system lacks in the localness and hierarchy
intrinsic to the natural language problems.
Various types of Q and A blogs including
General, Technical and Medical are available where the combined blogs are very
less in number. Also, the Questions posted in such blogs are likely to incur
more time for being answered as the answers can only be posted by Medical
Experts and Doctors who do not find time to spend online. General blogs and
Technical blogs are very usual and any user can suggest answers to questions
posted in them. Medical blogs are restricted to general user answers as it
might lead to negative results on human lives.
M. Metev and V. P. Veiko, Laser Assisted Microtechnology, 2nd ed., R.
M. Osgood, Jr., Ed. Berlin, Germany:
J. Breckling, Ed., The Analysis of
Directional Time Series: Applications to Wind Speed and Direction, ser.
Lecture Notes in Statistics. Berlin,
Germany: Springer, 1989, vol. 61.
S. Zhang, C. Zhu, J. K. O. Sin, and P. K. T.
Mok, “A novel ultrathin elevated channel low-temperature poly-Si TFT,” IEEE
Electron Device Lett., vol. 20, pp. 569–571, Nov. 1999.
M. Wegmuller, J. P. von der Weid, P. Oberson,
and N. Gisin, “High resolution fiber distributed measurements with coherent
OFDR,” in Proc. ECOC’00, 2000, paper 11.3.4, p. 109.
R. E. Sorace, V. S. Reinhardt, and S. A. Vaughn,
“High-speed digital-to-RF converter,” U.S. Patent 5 668 842, Sept. 16, 1997.
(2002) The IEEE website. Online. Available: http://www.ieee.org/
Shell. (2002) IEEEtran homepage on CTAN. Online. Available:
FLEXChip Signal Processor (MC68175/D), Motorola, 1996.
data sheet,” Opto Speed SA, Mezzovico,
A. Karnik, “Performance of TCP congestion
control with rate feedback: TCP/ABR
and rate adaptive TCP/IP,” M. Eng. thesis, Indian Institute of Science, Bangalore, India,
J. Padhye, V. Firoiu, and D. Towsley, “A
stochastic model of TCP Reno congestion
avoidance and control,” Univ. of Massachusetts, Amherst,
MA, CMPSCI Tech. Rep. 99-02, 1999.
Wireless LAN Medium Access Control (MAC) and
Physical Layer (PHY) Specification, IEEE Std. 802.11, 1997.