[Paper] Natural Language Processing: A Review — IJERT (ISSN 2249-3905)

📄 Survey Review IJERT · ISSN 2249-3905 NLP Survey

Natural Language Processing: A Review

Survey Paper  ·  International Journal of Engineering Research and Technology (IJERT)  ·  ISSN 2249-3905

Captain Ethan
Captain Ethan
Maritime 4.0 · AI, Data & Cyber Security
📅April 9, 2026
Paper Details
Title Natural Language Processing: A Review
Type Survey / Review Paper
Journal International Journal of Engineering Research and Technology (IJERT)
ISSN 2249-3905
Scope NLP fundamentals · core tasks · approaches from rule-based to neural · applications
※ This review reflects the reviewer's independent analysis and does not represent the views of the original authors.

Natural Language Processing sits at the intersection of linguistics, computer science, and artificial intelligence. From the early rule-based parsers of the 1950s to today's large language models, the field has undergone three complete paradigm shifts. This survey maps the full arc — providing a structured entry point for practitioners who need to understand not just what NLP can do today, but why it works the way it does, and where the hard problems remain.

Contents of This Review
  1. What Is NLP — Scope and Goals
  2. Three Paradigms: Rule-Based → Statistical → Neural
  3. Core NLP Task Taxonomy
  4. Key Tasks in Depth
  5. Applications Across Domains
  6. Challenges and Open Problems
  7. Assessment & Closing Reflection

📌 (1) What Is NLP — Scope and Goals

Natural Language Processing (NLP) is the subfield of AI concerned with enabling computers to understand, interpret, generate, and interact with human language — in text and speech form. Unlike formal languages (programming languages, logic notation), natural language is inherently ambiguous, context-dependent, and constantly evolving.

🔤 Understanding

Parsing syntax, resolving semantics, identifying named entities, coreference resolution

🔁 Transformation

Translation, summarization, paraphrase, style transfer between languages and registers

💬 Generation

Fluent text generation, dialogue systems, question answering, story generation

📊 Analysis

Sentiment analysis, topic modeling, information extraction, intent classification

The central difficulty of NLP is that language meaning is not compositional in a simple way — the same sentence can mean different things in different contexts, and the same meaning can be expressed by infinitely many different sentences. No formal grammar fully captures natural language in use.

🔄 (2) Three Paradigms: Rule-Based → Statistical → Neural

NLP has passed through three distinct paradigms, each supplanting the last while borrowing its insights:

1
Rule-Based Systems (1950s–1980s)

Hand-crafted grammars, pattern matching, expert-coded linguistic rules. High precision on narrow domains. Brittle — breaks immediately outside the rule scope. Does not scale.

2
Statistical NLP (1990s–2010s)

n-gram language models, HMMs, CRFs, SVMs trained on large corpora. Replaced rules with probability estimates from data. Enabled machine translation (IBM models, phrase-based SMT) and part-of-speech tagging at scale.

3
Neural NLP (2013–present)

Word embeddings (Word2Vec, GloVe) → sequence models (LSTM, GRU) → attention mechanisms → Transformers (BERT, GPT, T5). End-to-end learned representations replace hand-engineered features entirely.

The Transformer Turning Point

"Attention Is All You Need" (Vaswani et al., 2017) replaced recurrence with self-attention, enabling massive parallelization and scaling. Pre-trained Transformers (BERT, GPT) introduced the fine-tuning paradigm — pre-train on large corpora, fine-tune on task-specific data — which now dominates NLP across virtually every task.

🗂 (3) Core NLP Task Taxonomy

NLP tasks are traditionally organized by linguistic level. The survey categorizes them as follows:

Level Tasks
Lexical Tokenization · Morphological analysis · Spell checking · Word sense disambiguation
Syntactic Part-of-speech tagging · Dependency parsing · Constituency parsing · Chunking
Semantic Named Entity Recognition (NER) · Semantic Role Labeling · Coreference resolution · Relation extraction
Discourse Text coherence · Discourse parsing · Sentiment analysis · Topic segmentation
Application Machine translation · Text summarization · Question answering · Dialogue systems · Text classification

🔬 (4) Key Tasks in Depth

Named Entity Recognition (NER)

Identifies and classifies named mentions (persons, organizations, locations, dates) in text. Evolved from hand-crafted gazetteers → CRF sequence labeling → BiLSTM-CRF → BERT fine-tuning. Now achieves near-human F1 on standard benchmarks (CoNLL-2003).

Machine Translation (MT)

The oldest large-scale NLP application. Statistical MT (phrase-based, IBM models) dominated until 2016 when neural MT (seq2seq + attention) became standard. The Transformer-based architecture now underpins all major MT systems (Google Translate, DeepL).

Sentiment Analysis

Classifies the sentiment polarity (positive/negative/neutral) or emotion of text. Spans document-level classification, sentence-level, and aspect-based sentiment (ABSA) — identifying sentiment toward specific entities or attributes within a document.

Question Answering (QA)

Extractive QA (SQuAD) finds answer spans within a given passage. Open-domain QA retrieves relevant documents first, then extracts answers. Generative QA (as in GPT-style models) synthesizes answers rather than extracting them — enabling responses beyond what any single document contains.

Text Summarization

Extractive methods select and combine existing sentences. Abstractive methods generate new sentences capturing the core meaning. Modern neural abstractive summarizers (PEGASUS, BART) approach human-level performance on news summarization benchmarks.

🌐 (5) Applications Across Domains

🏥 Healthcare

Clinical NLP for EHR analysis, ICD coding, adverse event detection, medical literature mining (PubMed NLP)

⚖️ Legal

Contract analysis, case law retrieval, regulatory compliance monitoring, legal document summarization

💹 Finance

News sentiment for trading signals, earnings call analysis, regulatory filing extraction, fraud detection in communications

🛡 Cybersecurity

Threat intelligence extraction from dark web text, malware report analysis, phishing detection, vulnerability disclosure NLP

⚓ Maritime

Port state control report mining, incident log analysis, classification society circular extraction, AIS vessel communication processing

🎓 Education

Automated essay scoring, reading comprehension assistance, intelligent tutoring systems, language learning feedback

⚠️ (6) Challenges and Open Problems

🔀 Ambiguity & Context

Lexical, syntactic, and pragmatic ambiguity remains difficult. Irony, sarcasm, and metaphor require world knowledge beyond linguistic pattern matching.

🌍 Low-Resource Languages

Most NLP advances are English-centric. The majority of the world's ~7,000 languages lack sufficient training data for neural approaches. Cross-lingual transfer and multilingual models address this partially.

🧠 Commonsense Reasoning

Language models learn statistical patterns but lack grounded world models. Commonsense inference — reasoning about physical, social, and temporal relationships — remains a fundamental gap.

🎭 Bias & Fairness

Models trained on internet text inherit and amplify social biases. Gender, racial, and cultural biases in NLP outputs are well-documented and difficult to fully eliminate without compromising model capability.

🎯 (7) Assessment & Closing Reflection

✔ Survey Value

Provides a structured map of the NLP landscape in a single accessible document. Valuable as an entry point for engineers and practitioners approaching NLP from adjacent fields.

✔ Breadth of Coverage

Covers the full pipeline from linguistic preprocessing to application-level systems, allowing readers to understand where specific techniques fit in the broader architecture.

⚠ Temporal Coverage

Survey papers in fast-moving fields age rapidly. For the neural NLP landscape post-2020 — including instruction-tuned LLMs, RLHF, and chain-of-thought prompting — more recent literature is essential reading.

NLP is now embedded in virtually every digital product that processes or generates text. The progression from brittle rules to probabilistic models to large neural networks is not merely a technical story — it reflects a deeper shift in how we think about encoding human knowledge: less explicit specification, more learned approximation from data.

For maritime and industrial applications — where documentation, regulations, incident reports, and operational logs represent dense, domain-specific text — NLP is not a future technology. It is an immediately applicable tool for extracting structure and insight from the language that already runs the industry.

Whether you are new to NLP or revisiting its foundations to contextualize the LLM era — this survey provides the vocabulary and structural map you need. Start with the task taxonomy. Understand the three paradigm shifts. Then read the Transformer paper. Everything else in modern NLP follows from those anchors.

— Captain Ethan, ShipPaulJobs

#NLP #PaperReview #NaturalLanguageProcessing #Transformer #BERT #MachineLearning #TextMining #SentimentAnalysis #AI #DeepLearning #Survey
Captain Ethan
Captain Ethan
Maritime 4.0 · AI, Data & Cyber Security

Maritime professional focused on the intersection of vessel operations, classification society regulations, and OT/IT cybersecurity. Writing for engineers, consultants, and operators navigating Maritime 4.0 together.

Comments