TokenMix Research Lab · 2026-07-02

Research AI Division of Labor: How to Combine GPT, Claude, Gemini, DeepSeek, and Real Skills

Summary

Many researchers still use AI in one giant chat box: literature review, paper polishing, code debugging, MATLAB errors, reviewer responses, and research diagrams all go into the same model.

That works for simple tasks, but it quickly becomes unstable:

Long literature contexts get compressed or missed.
Paper revision becomes sentence-level polishing instead of structure, evidence, and citation checking.
Code, reproduction details, simulation, and diagram generation get mixed together, so the model sounds confident but misses the implementation boundary.

My current approach is different:

Models define capability boundaries.
Skills define workflows.
Scenarios decide which route to use.

Instead of asking “Which is better, GPT, Claude, or Gemini?”, ask:

Is this a long-text task, a PDF/chart task, a writing task, a peer-review task,
a reproduction-debugging task, or a figure-generation task?
Is there a real Skill that standardizes the workflow?
Which model fits that Skill best?

This article breaks down common academic scenarios and explains several verified Skills in detail: deep-research, academic-paper, academic-paper-reviewer, paper-context-resolver, and gpt-image-2. The MATLAB section is intentionally described as a symbolic math / simulation scenario, not as a skills.sh Skill.

First: Model, Skill, and Scenario Are Different Things

What a model is

A model is the underlying capability layer. For example:

openai/gpt-5.4 and openai/gpt-5.5: strong for research planning, outlines, complex decomposition, and multi-step writing.
anthropic/claude-sonnet-5: strong for long documents, related work, academic rewriting, and large-context synthesis.
anthropic/claude-opus-4.8: strong for complex argumentation, reviewer responses, rebuttal strategy, and high-quality prose.
google/gemini-2.5-pro and google/gemini-3.5-flash: strong for PDFs, charts, screenshots, web materials, and multimodal understanding.
deepseek/deepseek-v4-pro: strong for reasoning, code debugging, reproduction analysis, and technical route checking.
moonshot/kimi-k2.7-code: strong for long code, function chains, engineering tasks, and repository-level reading.
qwen/qwen3.7-max and qwen/qwen3.7-plus: useful for Chinese expression, local-language material, rewriting, and localization.
zhipu/glm-5v-turbo and zhipu/glm-5.2: useful for visual reasoning, Chinese academic expression, and auxiliary review work.
openai/gpt-image-2 and qwen/qwen-image-max: useful for academic diagrams, presentation visuals, and Chinese image-text posters.

The model tells you what it may be good at. It does not guarantee that your workflow is sound.

What a Skill is

A Skill is a specialized workflow. It is not a model and not a single prompt. A good Skill specifies:

when it should trigger,
what inputs it needs,
how the task is decomposed,
what checks happen in the middle,
what output format to return,
and when not to use it.

For example, deep-research is not just “think harder.” It is an academic research workflow. academic-paper-reviewer is not just “review my paper.” It simulates a multi-perspective peer-review process.

What a scenario is

A scenario is the real task you need to finish:

scan 30 papers and identify gaps,
write an abstract and introduction,
prepare a response to reviewers,
resolve dataset split and evaluation protocol for reproduction,
derive transfer functions or state-space equations in MATLAB,
generate a mechanism diagram or presentation figure.

The scenario decides the route. The model is chosen after that.

My Academic AI Division Table

Scenario	Preferred models	Skill / workflow	Why
Research planning	GPT-5.4 / GPT-5.5	Deep Research Socratic or quick brief	Turns vague interests into variables, questions, and routes
Literature review / related work	Claude Sonnet 5 + Gemini 2.5 Pro	Deep Research `lit-review`	Claude handles long text; Gemini helps with PDFs and charts
Systematic review	Claude Sonnet 5 + Gemini 2.5 Pro	Deep Research `systematic-review`	Needs search strategy, inclusion criteria, and evidence tables
Fact checking	Gemini 2.5 Pro + Claude Sonnet 5	Deep Research `fact-check`	One model checks materials; another writes cautiously
Paper outline	GPT-5.4 + Claude Sonnet 5	Academic Paper `outline-only`	GPT structures; Claude improves academic expression
Paper revision	Claude Sonnet 5 + GPT-5.4	Academic Paper `revision-coach`	Diagnoses logic, evidence, and style instead of only polishing
Citation checking	Claude Sonnet 5 + Gemini 2.5 Pro	Academic Paper `citation-check`	Checks whether claims and references align
Reviewer response	Claude Opus 4.8 + GPT-5.4	Academic Paper `rebuttal-audit`	Requires tone, strategy, evidence, and boundaries
Simulated peer review	Claude Opus 4.8 + GLM-5.2	Academic Paper Reviewer	Multi-perspective criticism beats generic praise
Paper reproduction	DeepSeek V4 Pro + Claude Sonnet 5	paper-context-resolver	Resolves dataset split, preprocessing, checkpoint, and protocol gaps
Code debugging	DeepSeek V4 Pro + Kimi K2.7 Code	Coding workflow	Reasoning and long-code reading matter more than a named Skill
MATLAB derivation / simulation	GPT-5.4 + DeepSeek V4 Pro + Kimi K2.7 Code	MATLAB symbolic math scenario	Formula derivation, state-space equations, function drafts
Academic diagrams	GPT Image 2 + Qwen Image Max	gpt-image-2 Skill / image workflow	GPT Image 2 for structured layouts; Qwen for Chinese image-text

Skill 1: Deep Research

Source: deep-research in the Academic Research Skills GitHub repository.
Repository: https://github.com/imbad0202/academic-research-skills

deep-research is an academic research workflow. It is most useful before formal writing begins, especially for literature review, systematic review, fact checking, and research-question clarification.

Common modes include:

lit-review: literature review mode for mapping a field.
systematic-review: systematic review mode for search strategy, inclusion/exclusion criteria, and evidence tables.
three-way-scan: a three-way literature scan for quickly mapping background, methods, debates, or applications.
fact-check: fact-checking mode for verifying whether a claim is supported.
socratic: guided research-question clarification.

When to use it

Use it when:

you are entering a new field,
you need to turn 10 to 50 papers into a research map,
you need related work instead of summary stitching,
you want to verify a specific claim,
you are preparing a systematic review with explicit criteria.

Do not use it when:

you only need to polish one paragraph,
you already have a complete manuscript and only need formatting,
you expect the model to invent references.

Input template

For literature review:

Use deep-research in lit-review mode.

Topic: [your topic]
Field: [discipline]
Research goal: [question you want to answer]
Scope: [years, region, methods, population]
Must include: [keywords or theories]
Must exclude: [out-of-scope areas]
Output:
1. research question refinement
2. literature map
3. key debates
4. method clusters
5. evidence gaps
6. reading list with why each paper matters

For systematic review:

Use deep-research in systematic-review mode.

Research question:
Population / object:
Intervention / method:
Comparison:
Outcome:
Databases to consider:
Inclusion criteria:
Exclusion criteria:
Expected output: search strategy, screening table, evidence matrix, PRISMA-style summary.

Model pairing

My preferred pairing:

Claude Sonnet 5: long-context reading, literature synthesis, academic prose.
Gemini 2.5 Pro: PDFs, charts, screenshots, and multimodal materials.
GPT-5.4: research-question decomposition and output structure.

A practical chain:

GPT-5.4 clarifies the research question.
Deep Research + Claude Sonnet 5 builds the literature map.
Gemini 2.5 Pro handles PDF / chart / screenshot materials.
Claude summarizes the final related-work structure.

Skill 2: Academic Paper

Source: academic-paper in the Academic Research Skills GitHub repository.
Repository: https://github.com/imbad0202/academic-research-skills

academic-paper is for the writing stage. It is not merely a proofreading tool. It is a paper-writing pipeline covering configuration, structure, argumentation, abstract, citation, revision, rebuttal, and format conversion.

Common modes include:

outline-only: generate only the paper structure and outline.
revision-coach: diagnose and revise an existing draft.
citation-check: check in-text citations, references, and claim alignment.
rebuttal-audit: check whether a response to reviewers is complete and evidence-based.
format-convert: convert structure or citation format.
abstract-only: work only on the abstract.

When to use it

Use it when:

you have a research direction and need a paper structure,
you have a draft and want to improve logic, evidence, and style,
you want to check whether a claim is too large for the evidence,
you need to prepare a response to reviewers,
you need to rewrite an abstract for a target venue.

Do not use it when:

you have no materials and want a complete paper from nothing,
you want fabricated references,
you paste all reviewer comments without the manuscript or response draft.

Input templates

Outline:

Use academic-paper in outline-only mode.

Paper type: empirical / theoretical / review / case study
Field:
Target venue or style:
Research question:
Core claim:
Method / data:
Expected contribution:
Word count:
Citation style:
Output: section outline, argument flow, evidence needed for each section.

Revision coaching:

Use academic-paper in revision-coach mode.

Task: revise the following section.
Target: improve academic clarity, claim-evidence alignment, paragraph logic.
Do not invent citations.
Return:
1. diagnosis
2. revised version
3. change log
4. remaining risks

Draft:
[paste draft]

Reviewer response audit:

Use academic-paper in rebuttal-audit mode.

Inputs:
1. reviewer comments
2. current response draft
3. revised manuscript excerpt

Check:
- whether each concern is answered
- whether the tone is respectful
- whether evidence is specific
- whether any promise is not reflected in the manuscript

Model pairing

Claude Sonnet 5: long drafts, related work, academic paragraphs.
GPT-5.4: structure, argument decomposition, response strategy.
GLM-5.2: Chinese academic writing and Chinese reviewer responses.

A stable paper-writing chain:

Deep Research builds the literature map.
Academic Paper outline-only creates the structure.
Claude Sonnet 5 drafts or revises long sections.
Academic Paper citation-check checks claim-reference alignment.
Academic Paper rebuttal-audit checks the response letter.

Skill 3: Academic Paper Reviewer

Source: academic-paper-reviewer in the Academic Research Skills GitHub repository.
Repository: https://github.com/imbad0202/academic-research-skills

academic-paper-reviewer is useful because it simulates a multi-perspective peer-review process. Instead of asking one model to “give feedback,” it configures different reviewer personas, including editor-in-chief, methodology reviewer, domain reviewer, cross-disciplinary reviewer, and Devil’s Advocate.

Common modes include:

methodology-focus: checks research design, data, experiments, and statistical reporting.
guided: guided review mode.
re-review: verifies whether revisions answered previous comments.
calibration: calibrates against review standards or previous reviewer behavior.

When to use it

Use it when:

you want a pre-submission simulated review,
you need to check whether your response letter missed anything,
you want to stress-test the methodology section,
you want to clarify the paper’s contribution,
you want the strongest counter-arguments before submission.

Do not use it when:

you treat it as a real journal decision,
you do not provide enough methods or experimental details,
you only need language polishing.

Input templates

Full pre-submission review:

Use academic-paper-reviewer in full review mode.

Paper field:
Target journal/conference:
Manuscript:
[paste manuscript or provide file]

Focus:
1. originality
2. methodology
3. claim-evidence alignment
4. literature positioning
5. limitations
6. likely reviewer objections

Output: editorial decision, reviewer reports, major/minor revisions, revision roadmap.

Methodology check:

Use academic-paper-reviewer in methodology-focus mode.

Only evaluate:
- research design
- data source
- sample selection
- measurement
- statistical / experimental validity
- robustness checks

Do not rewrite the paper yet.
Return a risk table and required fixes.

Re-review:

Use academic-paper-reviewer in re-review mode.

Inputs:
1. original reviewer comments
2. response letter
3. revised manuscript excerpts

Check whether every reviewer concern is actually resolved.
Mark unresolved, partially resolved, and fully resolved items.

Model pairing

Claude Opus 4.8: complex argumentation, reviewer perspective, rebuttal strategy.
GLM-5.2: Chinese papers and Chinese review comments.
GPT-5.4: turning review feedback into a revision checklist.

Skill 4: paper-context-resolver

Source: paper-context-resolver on skills.sh.
Page: https://www.skills.sh/lllllllama/ai-paper-reproduction-skill/paper-context-resolver

This Skill is designed for paper reproduction. Its boundary matters: it is not a general paper summarizer and not an environment setup tool. It is meant for narrow reproduction-critical gaps that remain after reading the README and repository files.

It is suited for:

dataset split,
preprocessing,
evaluation protocol,
checkpoint mapping,
runtime assumptions,
conflicts between paper, README, config, and code.

When to use it

Use it when:

the paper uses a dataset but the README does not specify the split,
several checkpoints exist and you do not know which one matches a table,
code defaults differ from the paper description,
evaluation protocols have multiple versions,
README and paper conflict and you need an evidence ledger.

Do not use it when:

you want a general paper summary,
you want it to configure your environment,
you want it to run experiments,
you only provide a title with no repo or specific question.

Input template

Use paper-context-resolver.

Paper:
[paper title / DOI / arXiv / PDF link]

Repository:
[GitHub repo]

Reproduction question:
Which dataset split and preprocessing settings correspond to Table 2?

Known conflict:
README says [A], paper says [B], config file says [C].

Output:
1. primary evidence from paper
2. evidence from repo / README / config
3. conflict table
4. most likely reproduction setting
5. uncertainty and what to test next

Model pairing

Claude Sonnet 5: paper, README, issues, and long-context evidence.
DeepSeek V4 Pro: config analysis, code logic, metrics, and reproduction discrepancies.
Kimi K2.7 Code: large repositories and long function chains.

My usual workflow:

Read the README first.
Ask paper-context-resolver one narrow reproduction question.
Use DeepSeek / Kimi to inspect code and config.
Write the conclusion as a reproduction note.

Skill 5: gpt-image-2

Source: gpt-image-2 on skills.sh.
Page: https://www.skills.sh/doany-ai/skills/gpt-image-2

This Skill targets GPT Image 2 image generation and editing. It is useful for structured visual tasks, especially when the image needs layout, labels, arrows, workflow structure, or visual consistency.

Academic use cases:

mechanism diagram drafts,
technical route diagrams,
research workflow diagrams,
lab meeting cover slides,
teaching knowledge cards and WeChat article visuals,
bilingual or Chinese academic posters.

When to use it

Use it when:

you know the modules that must appear in the image,
the image needs flow, labels, and layout,
you want to turn a mechanism into an explanatory diagram,
you need multiple images in the same style.

Do not use it for:

exact data plots,
precise coordinates or statistical values,
final experimental images in a paper.

Input templates

Academic workflow diagram:

Use gpt-image-2 for a clean academic workflow diagram.

Canvas: 16:9 slide
Style: white background, journal presentation style, minimal colors
Elements:
1. Data collection
2. Preprocessing
3. Model training
4. Evaluation
5. Error analysis

Text labels must be short and readable.
Do not add extra logos or decorative text.
Use arrows to show direction.

Chinese research card:

Create a vertical 3:4 Chinese academic knowledge card.
Style: white paper, dark blue text, mint highlights, coral accent.
Topic: GPT / Claude / Gemini research workflow.
Keep all text inside safe margins.
No real logos, no QR code, no watermark.

Model pairing

GPT Image 2: structured diagrams, English labels, precise layout instructions.
Qwen Image Max: Chinese image-text posters and dense Chinese labels.

Practical workflow:

GPT-5.4 writes the image structure as bullet points.
gpt-image-2 generates the English or neutral structure diagram.
Qwen Image Max generates the Chinese poster version if needed.
Human checks labels, arrows, and logic.
Final paper figures can be redrawn in vector tools if precision is required.

MATLAB: Treat It as a Symbolic Math / Simulation Scenario

Source: MathWorks MATLAB Blog on Agentic AI Playground and Symbolic Math Skills.
Article: https://blogs.mathworks.com/matlab/2026/06/25/from-whiteboard-sketch-to-pareto-front-using-symbolic-math-skills-in-the-agentic-ai-playground/

For MATLAB, I would not label it as a skills.sh Skill. A safer wording is:

MATLAB symbolic math scenario
MATLAB derivation and simulation scenario
Symbolic Math Skills example in Agentic AI Playground

Useful tasks include:

deriving transfer functions,
converting to state-space equations,
manipulating symbolic variables and equations,
variable-precision arithmetic,
generating MATLAB function drafts,
explaining MATLAB errors,
turning derivations into lab notes.

Input templates

Derivation:

I am working on a MATLAB simulation.

Goal:
Derive the transfer function and convert it to state-space equations.

Given equations:
[paste equations]

Please:
1. define variables
2. show symbolic derivation
3. identify assumptions
4. produce MATLAB function draft
5. list checks I should run in MATLAB

Debugging:

Here is my MATLAB error and code.

Task:
Explain the error, locate the likely cause, and suggest a minimal fix.

Constraints:
Do not rewrite the whole project.
Keep variable names unchanged.
Return the corrected snippet and why it works.

Model pairing

GPT-5.4: explanation, derivation steps, function drafts.
DeepSeek V4 Pro: reasoning, debugging, algorithmic logic.
Kimi K2.7 Code: long MATLAB code and multi-function chains.

Three Complete Workflows

Workflow 1: From topic idea to literature review

1. GPT-5.4 turns a vague interest into three research questions.
2. Deep Research `socratic` clarifies variables, object, method, and boundary.
3. Deep Research `lit-review` builds the literature map.
4. Gemini 2.5 Pro handles PDFs, tables, and screenshots.
5. Claude Sonnet 5 writes the related-work structure.
6. Academic Paper `citation-check` checks claim-reference alignment.

Best for: thesis proposal, doctoral topic exploration, early course paper work.

Workflow 2: From draft to pre-submission check

1. Academic Paper `outline-only` checks the structure.
2. Claude Sonnet 5 revises introduction / related work.
3. Academic Paper `revision-coach` diagnoses each section.
4. Academic Paper Reviewer `methodology-focus` checks methods.
5. Academic Paper Reviewer full review simulates peer review.
6. GPT-5.4 turns major revisions into an action checklist.

Best for: pre-submission self-review, revision planning, finding logic gaps.

Workflow 3: From reproduction issue to experiment note

1. Read README and official repo first.
2. paper-context-resolver answers one reproduction-critical question.
3. Claude Sonnet 5 organizes paper / README / issue evidence.
4. DeepSeek V4 Pro analyzes code and evaluation protocol.
5. Kimi K2.7 Code reads long code paths if needed.
6. GPT-5.4 writes a reproduction note.

Best for: deep learning reproduction, mismatched metrics, unclear data splits.

Why Pay-as-You-Go Multi-Model Access Fits Researchers

Academic AI usage is uneven.

During paper deadlines, rebuttal weeks, or reproduction work, usage can be intense. During experiments, classes, or reading periods, you may not open the tools for days.

Monthly subscriptions across every platform create friction:

fixed monthly fees but uneven usage,
Claude for long text, Gemini for charts, GPT for planning, all in separate subscriptions,
lab groups cannot easily track who used what,
API keys, balances, and invoices are scattered.

That is why I prefer matching tools by:

long text -> Claude
PDF / chart -> Gemini
planning / structure -> GPT
code / reasoning -> DeepSeek / Kimi
Chinese writing -> Qwen / GLM
research diagrams -> GPT Image 2 / Qwen Image Max

With a multi-model entry point like TokenMix, these models can be used from one account on a pay-as-you-go basis. For researchers, the point is not finding one model that is always best. The point is avoiding fixed monthly costs for models you only need occasionally.

Before You Run Any Skill: Build a Research AI Input Pack

Many people think AI is unstable because the model is weak. In academic work, the more common reason is scattered input. Research tasks need context, boundaries, evidence, and output constraints. If you paste random fragments every time, the model has to guess the workflow.

I recommend preparing a reusable input pack:

Input material	What it contains	Useful for
Project card	topic, field, object, keywords, scope, exclusions	Deep Research, Academic Paper
Literature list	DOI, title, year, method, finding, why it matters	Deep Research, Academic Paper
Manuscript draft	title, abstract, introduction, method, results, discussion	Academic Paper, Academic Paper Reviewer
Citation table	in-text citation, reference, supported claim	Academic Paper `citation-check`
Reviewer package	reviewer comments, response draft, revised excerpts	Academic Paper `rebuttal-audit`, Reviewer `re-review`
Reproduction package	paper, repo, README, config, error, metric gap	paper-context-resolver, DeepSeek, Kimi
MATLAB package	equations, variables, current code, error, expected output	MATLAB symbolic math / debugging scenario
Figure package	purpose, size, required elements, forbidden elements, text	gpt-image-2, Qwen Image Max

A minimal project card:

Project card

Topic:
Field:
Research object:
Core question:
Keywords:
Scope:
Do not cover:
Current stage: topic exploration / literature review / draft / revision / reproduction
Target output:
Deadline:

The key benefit: models and Skills can change, but your context stays stable.

General Rule: Do Not Ask the Skill to “Write” Too Early

The most common academic AI mistake is asking for the final draft immediately.

A safer pattern:

Step 1: ask the model to identify task type and missing inputs
Step 2: let the Skill produce intermediate structure
Step 3: generate the final prose, table, figure, or revision

For example, instead of saying:

Write my related work.

Use:

I am preparing a related work section.
First, check whether my inputs are sufficient:
1. Is the research question clear?
2. Can the literature be grouped by method or debate?
3. Are key opposing papers missing?
4. Does each claim have citation support?
Do not write the prose yet. Return only missing inputs and a suggested workflow.

This prevents a lot of fluent but hollow academic writing.

Output Quality Gates for Each Skill

What a good Deep Research output should include

A strong deep-research output should not be just “10 paper summaries.” It should contain:

refined research questions,
literature clusters by theory, method, object, or finding,
key debates and why papers disagree,
methodology map,
evidence strength,
research gaps with boundaries,
a must-read sequence.

Quality-gate prompt:

Audit the deep-research output.

Check whether it includes:
1. refined research questions
2. literature clusters
3. key debates
4. evidence strength
5. methodology map
6. research gaps
7. must-read sequence

Mark each item as pass / partial / missing.
If missing, tell me what input I need to provide.

What a good Academic Paper output should include

A strong academic-paper output should make the manuscript more publishable, not just more polished:

each section has a clear function,
introduction narrows from problem to gap to contribution,
related work is a research map rather than stitched summaries,
method contains reproducible details,
results explain what metrics mean,
discussion does not overclaim,
citations support specific claims.

Quality-gate prompt:

Audit this Academic Paper output.

Check:
1. Does each section have a clear function?
2. Are claims proportional to evidence?
3. Are citations used to support specific claims?
4. Is the contribution explicit?
5. Are limitations honest and specific?
6. Does the revised text preserve my intended meaning?

Return a table: issue / severity / suggested fix.

What a good Academic Paper Reviewer output should include

Simulated review has two failure modes: vague praise and performative harshness.

A useful review should include:

editorial decision tendency and rationale,
major concerns that affect validity,
minor concerns separated from fatal issues,
methodology risks,
contribution clarity,
revision roadmap,
confidence level and missing evidence.

Quality-gate prompt:

Audit the simulated peer review.

Check:
1. Are major concerns evidence-based?
2. Are minor comments separated from fatal issues?
3. Does the review identify methodology risks?
4. Does it give actionable revision steps?
5. Does it avoid asking for irrelevant extra work?

Return a reviewer-quality score from 1 to 5 and explain.

What a good paper-context-resolver output should include

This Skill is narrow by design. A good output is an evidence ledger, not a paper summary.

It should include:

the exact reproduction question,
paper evidence,
repo / README / config evidence,
conflict table,
most likely reproduction setting,
uncertainty,
next minimal test.

Quality-gate prompt:

Audit the paper-context-resolver output.

It should not summarize the whole paper.
It should answer one reproduction-critical question.

Check:
1. Is the question narrow?
2. Does it cite paper evidence?
3. Does it cite repo/config evidence?
4. Does it record conflicts?
5. Does it separate certainty from hypothesis?
6. Does it suggest the next minimal test?

What a good gpt-image-2 output should include

Academic figures are not only about visual appeal. A useful generated figure should have:

correct structure,
readable labels,
a single clear message,
reusable style,
no fabricated data.

Quality-gate prompt:

Evaluate this generated research figure.

Check:
1. Are all required elements present?
2. Is the flow direction correct?
3. Are text labels readable?
4. Is there any extra or hallucinated element?
5. Is the style appropriate for paper / slides / social post?
6. What should be edited manually before publishing?

Model Fallback: How to Switch When the First Model Fails

Model switching should not be random. Switch based on failure type:

Failure type	Common symptom	Fallback strategy
Missing long-context details	appendix, tables, or limitations ignored	Claude Sonnet 5 / Gemini 2.5 Pro
Misreading charts	axes, legends, table meaning wrong	Gemini 2.5 Pro / Gemini 3.5 Flash
Hollow logic	many claims, weak reasoning chain	GPT-5.4 / DeepSeek V4 Pro
Broken code edits	too many changes, renamed variables	Kimi K2.7 Code / DeepSeek V4 Pro
Awkward Chinese prose	translated tone, unnatural phrasing	Qwen3.7 Max / GLM-5.2
Too gentle as reviewer	praise without critique	Claude Opus 4.8 + Academic Paper Reviewer
Too harsh as reviewer	irrelevant extra demands	guided mode with limited criteria
Bad image text	wrong labels, distorted Chinese text	Qwen Image Max or manual layout

A useful fallback instruction:

The previous model output failed because:
[specific failure]

Redo the task with a different emphasis:
- preserve original meaning
- fix only the failed part
- do not rewrite unrelated sections
- explain what changed and why

Lab / Research Group Workflow: Do Not Let Everyone Ask From Scratch

For a lab group, the worst pattern is every student asking from zero. A better pattern is shared AI-ready material:

lab_ai/
  project_cards/
  literature_matrix/
  paper_drafts/
  reviewer_comments/
  reproduction_notes/
  prompts/
  figures/

Suggested shared tables

Literature matrix:

Field	Meaning
paper_id	internal ID
citation	formatted citation
research question	what the paper answers
method	method
data	data
key finding	main finding
limitation	stated limitation
useful for	which section it supports
evidence strength	strong / medium / weak

Reproduction note:

Field	Meaning
paper	target paper
repo	official repository
question	narrow reproduction question
paper evidence	evidence from paper
repo evidence	evidence from repository
conflict	conflict point
chosen setting	final selected setting
next test	next experiment

Reviewer response tracker:

Field	Meaning
reviewer	R1 / R2 / R3
comment id	comment number
issue type	method / writing / citation / experiment
response status	done / partial / pending
manuscript change	where it changed
evidence	supporting material

The goal is to turn Skill outputs into lab assets, not disposable chat logs.

A One-Week Starter Plan

Day 1: Build your project card

Use GPT-5.4 to turn your research interest into three concrete research questions.

Day 2: Run Deep Research `lit-review`

Pick one question only. Ask for literature clusters, debates, and reading sequence.

Day 3: Use Gemini for PDFs and charts

Take figures, tables, and method sections from 3 to 5 key papers and let Gemini explain them.

Day 4: Use Academic Paper `outline-only`

Input the research question, literature map, and method. Ask only for structure, not full prose.

Day 5: Use Claude Sonnet 5 to revise one section

Revise one related-work section first. Require a change log.

Day 6: Use Academic Paper Reviewer `methodology-focus`

Ask it to check only methodology risks. Avoid generic peer review at this stage.

Day 7: Build your own research AI template

Record which models worked, which prompts were reliable, and which outputs needed verification.

Common Mistakes

Treating a Skill as a model. A Skill is workflow; a model is execution capability.
Treating model output as fact. Literature, citations, dataset splits, and configs must be verified against primary sources.
Asking AI to write the whole paper in one step. Structure and evidence should come before prose.
Asking reproduction questions that are too broad. paper-context-resolver is best for narrow reproduction-critical gaps.
Publishing generated figures without checking labels, arrows, and logic.
Using the most expensive model for every task. Many summaries, rewrites, translations, and formatting tasks can use lighter models.
Using the cheapest model for every task. Reviewer responses, complex arguments, and methodology checks are the wrong places to underinvest.

Final Checklist

Before opening an AI tool, ask:

1. Is this a long-text task or a PDF/chart task?
2. Is this writing, review, revision, reproduction, or coding?
3. Is this a narrow reproduction detail or general code debugging?
4. Is there a real Skill that standardizes this workflow?
5. Does the task benefit from using multiple models on demand?

If you answer these five questions first, model selection becomes much easier.

Verified Skills and Sources

Academic Research Skills: https://github.com/imbad0202/academic-research-skills
Deep Research Skill: https://github.com/imbad0202/academic-research-skills/tree/main/deep-research
Academic Paper Skill: https://github.com/imbad0202/academic-research-skills/tree/main/academic-paper
Academic Paper Reviewer Skill: https://github.com/imbad0202/academic-research-skills/tree/main/academic-paper-reviewer
paper-context-resolver: https://www.skills.sh/lllllllama/ai-paper-reproduction-skill/paper-context-resolver
gpt-image-2: https://www.skills.sh/doany-ai/skills/gpt-image-2
MathWorks MATLAB Symbolic Math Skills example: https://blogs.mathworks.com/matlab/2026/06/25/from-whiteboard-sketch-to-pareto-front-using-symbolic-math-skills-in-the-agentic-ai-playground/

Research AI Division of Labor: How to Combine GPT, Claude, Gemini, DeepSeek, and Real Skills

Summary

First: Model, Skill, and Scenario Are Different Things

What a model is

What a Skill is

What a scenario is

My Academic AI Division Table

Skill 1: Deep Research

When to use it

Input template

Model pairing

Skill 2: Academic Paper

When to use it

Input templates

Model pairing

Skill 3: Academic Paper Reviewer

When to use it

Input templates

Model pairing

Skill 4: paper-context-resolver

When to use it

Input template

Model pairing

Skill 5: gpt-image-2

When to use it

Input templates

Model pairing

MATLAB: Treat It as a Symbolic Math / Simulation Scenario

Input templates

Model pairing

Three Complete Workflows

Workflow 1: From topic idea to literature review

Workflow 2: From draft to pre-submission check

Workflow 3: From reproduction issue to experiment note

Why Pay-as-You-Go Multi-Model Access Fits Researchers

Before You Run Any Skill: Build a Research AI Input Pack

General Rule: Do Not Ask the Skill to “Write” Too Early

Output Quality Gates for Each Skill

What a good Deep Research output should include

What a good Academic Paper output should include

What a good Academic Paper Reviewer output should include

What a good paper-context-resolver output should include

What a good gpt-image-2 output should include

Model Fallback: How to Switch When the First Model Fails

Lab / Research Group Workflow: Do Not Let Everyone Ask From Scratch

Suggested shared tables

A One-Week Starter Plan

Day 1: Build your project card

Day 2: Run Deep Research lit-review

Day 3: Use Gemini for PDFs and charts

Day 4: Use Academic Paper outline-only

Day 5: Use Claude Sonnet 5 to revise one section

Day 6: Use Academic Paper Reviewer methodology-focus

Day 7: Build your own research AI template

Common Mistakes

Final Checklist

Verified Skills and Sources

Day 2: Run Deep Research `lit-review`

Day 4: Use Academic Paper `outline-only`

Day 6: Use Academic Paper Reviewer `methodology-focus`