TokenMix Research Lab · 2026-07-02
Research AI Division of Labor: How to Combine GPT, Claude, Gemini, DeepSeek, and Real Skills
Summary
Many researchers still use AI in one giant chat box: literature review, paper polishing, code debugging, MATLAB errors, reviewer responses, and research diagrams all go into the same model.
That works for simple tasks, but it quickly becomes unstable:
- Long literature contexts get compressed or missed.
- Paper revision becomes sentence-level polishing instead of structure, evidence, and citation checking.
- Code, reproduction details, simulation, and diagram generation get mixed together, so the model sounds confident but misses the implementation boundary.
My current approach is different:
Models define capability boundaries.
Skills define workflows.
Scenarios decide which route to use.
Instead of asking “Which is better, GPT, Claude, or Gemini?”, ask:
Is this a long-text task, a PDF/chart task, a writing task, a peer-review task,
a reproduction-debugging task, or a figure-generation task?
Is there a real Skill that standardizes the workflow?
Which model fits that Skill best?
This article breaks down common academic scenarios and explains several verified Skills in detail: deep-research, academic-paper, academic-paper-reviewer, paper-context-resolver, and gpt-image-2. The MATLAB section is intentionally described as a symbolic math / simulation scenario, not as a skills.sh Skill.
First: Model, Skill, and Scenario Are Different Things
What a model is
A model is the underlying capability layer. For example:
openai/gpt-5.4andopenai/gpt-5.5: strong for research planning, outlines, complex decomposition, and multi-step writing.anthropic/claude-sonnet-5: strong for long documents, related work, academic rewriting, and large-context synthesis.anthropic/claude-opus-4.8: strong for complex argumentation, reviewer responses, rebuttal strategy, and high-quality prose.google/gemini-2.5-proandgoogle/gemini-3.5-flash: strong for PDFs, charts, screenshots, web materials, and multimodal understanding.deepseek/deepseek-v4-pro: strong for reasoning, code debugging, reproduction analysis, and technical route checking.moonshot/kimi-k2.7-code: strong for long code, function chains, engineering tasks, and repository-level reading.qwen/qwen3.7-maxandqwen/qwen3.7-plus: useful for Chinese expression, local-language material, rewriting, and localization.zhipu/glm-5v-turboandzhipu/glm-5.2: useful for visual reasoning, Chinese academic expression, and auxiliary review work.openai/gpt-image-2andqwen/qwen-image-max: useful for academic diagrams, presentation visuals, and Chinese image-text posters.
The model tells you what it may be good at. It does not guarantee that your workflow is sound.
What a Skill is
A Skill is a specialized workflow. It is not a model and not a single prompt. A good Skill specifies:
- when it should trigger,
- what inputs it needs,
- how the task is decomposed,
- what checks happen in the middle,
- what output format to return,
- and when not to use it.
For example, deep-research is not just “think harder.” It is an academic research workflow. academic-paper-reviewer is not just “review my paper.” It simulates a multi-perspective peer-review process.
What a scenario is
A scenario is the real task you need to finish:
- scan 30 papers and identify gaps,
- write an abstract and introduction,
- prepare a response to reviewers,
- resolve dataset split and evaluation protocol for reproduction,
- derive transfer functions or state-space equations in MATLAB,
- generate a mechanism diagram or presentation figure.
The scenario decides the route. The model is chosen after that.
My Academic AI Division Table
| Scenario | Preferred models | Skill / workflow | Why |
|---|---|---|---|
| Research planning | GPT-5.4 / GPT-5.5 | Deep Research Socratic or quick brief | Turns vague interests into variables, questions, and routes |
| Literature review / related work | Claude Sonnet 5 + Gemini 2.5 Pro | Deep Research lit-review |
Claude handles long text; Gemini helps with PDFs and charts |
| Systematic review | Claude Sonnet 5 + Gemini 2.5 Pro | Deep Research systematic-review |
Needs search strategy, inclusion criteria, and evidence tables |
| Fact checking | Gemini 2.5 Pro + Claude Sonnet 5 | Deep Research fact-check |
One model checks materials; another writes cautiously |
| Paper outline | GPT-5.4 + Claude Sonnet 5 | Academic Paper outline-only |
GPT structures; Claude improves academic expression |
| Paper revision | Claude Sonnet 5 + GPT-5.4 | Academic Paper revision-coach |
Diagnoses logic, evidence, and style instead of only polishing |
| Citation checking | Claude Sonnet 5 + Gemini 2.5 Pro | Academic Paper citation-check |
Checks whether claims and references align |
| Reviewer response | Claude Opus 4.8 + GPT-5.4 | Academic Paper rebuttal-audit |
Requires tone, strategy, evidence, and boundaries |
| Simulated peer review | Claude Opus 4.8 + GLM-5.2 | Academic Paper Reviewer | Multi-perspective criticism beats generic praise |
| Paper reproduction | DeepSeek V4 Pro + Claude Sonnet 5 | paper-context-resolver | Resolves dataset split, preprocessing, checkpoint, and protocol gaps |
| Code debugging | DeepSeek V4 Pro + Kimi K2.7 Code | Coding workflow | Reasoning and long-code reading matter more than a named Skill |
| MATLAB derivation / simulation | GPT-5.4 + DeepSeek V4 Pro + Kimi K2.7 Code | MATLAB symbolic math scenario | Formula derivation, state-space equations, function drafts |
| Academic diagrams | GPT Image 2 + Qwen Image Max | gpt-image-2 Skill / image workflow | GPT Image 2 for structured layouts; Qwen for Chinese image-text |
Skill 1: Deep Research
Source: deep-research in the Academic Research Skills GitHub repository.
Repository: https://github.com/imbad0202/academic-research-skills
deep-research is an academic research workflow. It is most useful before formal writing begins, especially for literature review, systematic review, fact checking, and research-question clarification.
Common modes include:
lit-review: literature review mode for mapping a field.systematic-review: systematic review mode for search strategy, inclusion/exclusion criteria, and evidence tables.three-way-scan: a three-way literature scan for quickly mapping background, methods, debates, or applications.fact-check: fact-checking mode for verifying whether a claim is supported.socratic: guided research-question clarification.
When to use it
Use it when:
- you are entering a new field,
- you need to turn 10 to 50 papers into a research map,
- you need related work instead of summary stitching,
- you want to verify a specific claim,
- you are preparing a systematic review with explicit criteria.
Do not use it when:
- you only need to polish one paragraph,
- you already have a complete manuscript and only need formatting,
- you expect the model to invent references.
Input template
For literature review:
Use deep-research in lit-review mode.
Topic: [your topic]
Field: [discipline]
Research goal: [question you want to answer]
Scope: [years, region, methods, population]
Must include: [keywords or theories]
Must exclude: [out-of-scope areas]
Output:
1. research question refinement
2. literature map
3. key debates
4. method clusters
5. evidence gaps
6. reading list with why each paper matters
For systematic review:
Use deep-research in systematic-review mode.
Research question:
Population / object:
Intervention / method:
Comparison:
Outcome:
Databases to consider:
Inclusion criteria:
Exclusion criteria:
Expected output: search strategy, screening table, evidence matrix, PRISMA-style summary.
Model pairing
My preferred pairing:
- Claude Sonnet 5: long-context reading, literature synthesis, academic prose.
- Gemini 2.5 Pro: PDFs, charts, screenshots, and multimodal materials.
- GPT-5.4: research-question decomposition and output structure.
A practical chain:
GPT-5.4 clarifies the research question.
Deep Research + Claude Sonnet 5 builds the literature map.
Gemini 2.5 Pro handles PDF / chart / screenshot materials.
Claude summarizes the final related-work structure.
Skill 2: Academic Paper
Source: academic-paper in the Academic Research Skills GitHub repository.
Repository: https://github.com/imbad0202/academic-research-skills
academic-paper is for the writing stage. It is not merely a proofreading tool. It is a paper-writing pipeline covering configuration, structure, argumentation, abstract, citation, revision, rebuttal, and format conversion.
Common modes include:
outline-only: generate only the paper structure and outline.revision-coach: diagnose and revise an existing draft.citation-check: check in-text citations, references, and claim alignment.rebuttal-audit: check whether a response to reviewers is complete and evidence-based.format-convert: convert structure or citation format.abstract-only: work only on the abstract.
When to use it
Use it when:
- you have a research direction and need a paper structure,
- you have a draft and want to improve logic, evidence, and style,
- you want to check whether a claim is too large for the evidence,
- you need to prepare a response to reviewers,
- you need to rewrite an abstract for a target venue.
Do not use it when:
- you have no materials and want a complete paper from nothing,
- you want fabricated references,
- you paste all reviewer comments without the manuscript or response draft.
Input templates
Outline:
Use academic-paper in outline-only mode.
Paper type: empirical / theoretical / review / case study
Field:
Target venue or style:
Research question:
Core claim:
Method / data:
Expected contribution:
Word count:
Citation style:
Output: section outline, argument flow, evidence needed for each section.
Revision coaching:
Use academic-paper in revision-coach mode.
Task: revise the following section.
Target: improve academic clarity, claim-evidence alignment, paragraph logic.
Do not invent citations.
Return:
1. diagnosis
2. revised version
3. change log
4. remaining risks
Draft:
[paste draft]
Reviewer response audit:
Use academic-paper in rebuttal-audit mode.
Inputs:
1. reviewer comments
2. current response draft
3. revised manuscript excerpt
Check:
- whether each concern is answered
- whether the tone is respectful
- whether evidence is specific
- whether any promise is not reflected in the manuscript
Model pairing
- Claude Sonnet 5: long drafts, related work, academic paragraphs.
- GPT-5.4: structure, argument decomposition, response strategy.
- GLM-5.2: Chinese academic writing and Chinese reviewer responses.
A stable paper-writing chain:
Deep Research builds the literature map.
Academic Paper outline-only creates the structure.
Claude Sonnet 5 drafts or revises long sections.
Academic Paper citation-check checks claim-reference alignment.
Academic Paper rebuttal-audit checks the response letter.
Skill 3: Academic Paper Reviewer
Source: academic-paper-reviewer in the Academic Research Skills GitHub repository.
Repository: https://github.com/imbad0202/academic-research-skills
academic-paper-reviewer is useful because it simulates a multi-perspective peer-review process. Instead of asking one model to “give feedback,” it configures different reviewer personas, including editor-in-chief, methodology reviewer, domain reviewer, cross-disciplinary reviewer, and Devil’s Advocate.
Common modes include:
methodology-focus: checks research design, data, experiments, and statistical reporting.guided: guided review mode.re-review: verifies whether revisions answered previous comments.calibration: calibrates against review standards or previous reviewer behavior.
When to use it
Use it when:
- you want a pre-submission simulated review,
- you need to check whether your response letter missed anything,
- you want to stress-test the methodology section,
- you want to clarify the paper’s contribution,
- you want the strongest counter-arguments before submission.
Do not use it when:
- you treat it as a real journal decision,
- you do not provide enough methods or experimental details,
- you only need language polishing.
Input templates
Full pre-submission review:
Use academic-paper-reviewer in full review mode.
Paper field:
Target journal/conference:
Manuscript:
[paste manuscript or provide file]
Focus:
1. originality
2. methodology
3. claim-evidence alignment
4. literature positioning
5. limitations
6. likely reviewer objections
Output: editorial decision, reviewer reports, major/minor revisions, revision roadmap.
Methodology check:
Use academic-paper-reviewer in methodology-focus mode.
Only evaluate:
- research design
- data source
- sample selection
- measurement
- statistical / experimental validity
- robustness checks
Do not rewrite the paper yet.
Return a risk table and required fixes.
Re-review:
Use academic-paper-reviewer in re-review mode.
Inputs:
1. original reviewer comments
2. response letter
3. revised manuscript excerpts
Check whether every reviewer concern is actually resolved.
Mark unresolved, partially resolved, and fully resolved items.
Model pairing
- Claude Opus 4.8: complex argumentation, reviewer perspective, rebuttal strategy.
- GLM-5.2: Chinese papers and Chinese review comments.
- GPT-5.4: turning review feedback into a revision checklist.
Skill 4: paper-context-resolver
Source: paper-context-resolver on skills.sh.
Page: https://www.skills.sh/lllllllama/ai-paper-reproduction-skill/paper-context-resolver
This Skill is designed for paper reproduction. Its boundary matters: it is not a general paper summarizer and not an environment setup tool. It is meant for narrow reproduction-critical gaps that remain after reading the README and repository files.
It is suited for:
- dataset split,
- preprocessing,
- evaluation protocol,
- checkpoint mapping,
- runtime assumptions,
- conflicts between paper, README, config, and code.
When to use it
Use it when:
- the paper uses a dataset but the README does not specify the split,
- several checkpoints exist and you do not know which one matches a table,
- code defaults differ from the paper description,
- evaluation protocols have multiple versions,
- README and paper conflict and you need an evidence ledger.
Do not use it when:
- you want a general paper summary,
- you want it to configure your environment,
- you want it to run experiments,
- you only provide a title with no repo or specific question.
Input template
Use paper-context-resolver.
Paper:
[paper title / DOI / arXiv / PDF link]
Repository:
[GitHub repo]
Reproduction question:
Which dataset split and preprocessing settings correspond to Table 2?
Known conflict:
README says [A], paper says [B], config file says [C].
Output:
1. primary evidence from paper
2. evidence from repo / README / config
3. conflict table
4. most likely reproduction setting
5. uncertainty and what to test next
Model pairing
- Claude Sonnet 5: paper, README, issues, and long-context evidence.
- DeepSeek V4 Pro: config analysis, code logic, metrics, and reproduction discrepancies.
- Kimi K2.7 Code: large repositories and long function chains.
My usual workflow:
Read the README first.
Ask paper-context-resolver one narrow reproduction question.
Use DeepSeek / Kimi to inspect code and config.
Write the conclusion as a reproduction note.
Skill 5: gpt-image-2
Source: gpt-image-2 on skills.sh.
Page: https://www.skills.sh/doany-ai/skills/gpt-image-2
This Skill targets GPT Image 2 image generation and editing. It is useful for structured visual tasks, especially when the image needs layout, labels, arrows, workflow structure, or visual consistency.
Academic use cases:
- mechanism diagram drafts,
- technical route diagrams,
- research workflow diagrams,
- lab meeting cover slides,
- teaching knowledge cards and WeChat article visuals,
- bilingual or Chinese academic posters.
When to use it
Use it when:
- you know the modules that must appear in the image,
- the image needs flow, labels, and layout,
- you want to turn a mechanism into an explanatory diagram,
- you need multiple images in the same style.
Do not use it for:
- exact data plots,
- precise coordinates or statistical values,
- final experimental images in a paper.
Input templates
Academic workflow diagram:
Use gpt-image-2 for a clean academic workflow diagram.
Canvas: 16:9 slide
Style: white background, journal presentation style, minimal colors
Elements:
1. Data collection
2. Preprocessing
3. Model training
4. Evaluation
5. Error analysis
Text labels must be short and readable.
Do not add extra logos or decorative text.
Use arrows to show direction.
Chinese research card:
Create a vertical 3:4 Chinese academic knowledge card.
Style: white paper, dark blue text, mint highlights, coral accent.
Topic: GPT / Claude / Gemini research workflow.
Keep all text inside safe margins.
No real logos, no QR code, no watermark.
Model pairing
- GPT Image 2: structured diagrams, English labels, precise layout instructions.
- Qwen Image Max: Chinese image-text posters and dense Chinese labels.
Practical workflow:
GPT-5.4 writes the image structure as bullet points.
gpt-image-2 generates the English or neutral structure diagram.
Qwen Image Max generates the Chinese poster version if needed.
Human checks labels, arrows, and logic.
Final paper figures can be redrawn in vector tools if precision is required.
MATLAB: Treat It as a Symbolic Math / Simulation Scenario
Source: MathWorks MATLAB Blog on Agentic AI Playground and Symbolic Math Skills.
Article: https://blogs.mathworks.com/matlab/2026/06/25/from-whiteboard-sketch-to-pareto-front-using-symbolic-math-skills-in-the-agentic-ai-playground/
For MATLAB, I would not label it as a skills.sh Skill. A safer wording is:
MATLAB symbolic math scenario
MATLAB derivation and simulation scenario
Symbolic Math Skills example in Agentic AI Playground
Useful tasks include:
- deriving transfer functions,
- converting to state-space equations,
- manipulating symbolic variables and equations,
- variable-precision arithmetic,
- generating MATLAB function drafts,
- explaining MATLAB errors,
- turning derivations into lab notes.
Input templates
Derivation:
I am working on a MATLAB simulation.
Goal:
Derive the transfer function and convert it to state-space equations.
Given equations:
[paste equations]
Please:
1. define variables
2. show symbolic derivation
3. identify assumptions
4. produce MATLAB function draft
5. list checks I should run in MATLAB
Debugging:
Here is my MATLAB error and code.
Task:
Explain the error, locate the likely cause, and suggest a minimal fix.
Constraints:
Do not rewrite the whole project.
Keep variable names unchanged.
Return the corrected snippet and why it works.
Model pairing
- GPT-5.4: explanation, derivation steps, function drafts.
- DeepSeek V4 Pro: reasoning, debugging, algorithmic logic.
- Kimi K2.7 Code: long MATLAB code and multi-function chains.
Three Complete Workflows
Workflow 1: From topic idea to literature review
1. GPT-5.4 turns a vague interest into three research questions.
2. Deep Research `socratic` clarifies variables, object, method, and boundary.
3. Deep Research `lit-review` builds the literature map.
4. Gemini 2.5 Pro handles PDFs, tables, and screenshots.
5. Claude Sonnet 5 writes the related-work structure.
6. Academic Paper `citation-check` checks claim-reference alignment.
Best for: thesis proposal, doctoral topic exploration, early course paper work.
Workflow 2: From draft to pre-submission check
1. Academic Paper `outline-only` checks the structure.
2. Claude Sonnet 5 revises introduction / related work.
3. Academic Paper `revision-coach` diagnoses each section.
4. Academic Paper Reviewer `methodology-focus` checks methods.
5. Academic Paper Reviewer full review simulates peer review.
6. GPT-5.4 turns major revisions into an action checklist.
Best for: pre-submission self-review, revision planning, finding logic gaps.
Workflow 3: From reproduction issue to experiment note
1. Read README and official repo first.
2. paper-context-resolver answers one reproduction-critical question.
3. Claude Sonnet 5 organizes paper / README / issue evidence.
4. DeepSeek V4 Pro analyzes code and evaluation protocol.
5. Kimi K2.7 Code reads long code paths if needed.
6. GPT-5.4 writes a reproduction note.
Best for: deep learning reproduction, mismatched metrics, unclear data splits.
Why Pay-as-You-Go Multi-Model Access Fits Researchers
Academic AI usage is uneven.
During paper deadlines, rebuttal weeks, or reproduction work, usage can be intense. During experiments, classes, or reading periods, you may not open the tools for days.
Monthly subscriptions across every platform create friction:
- fixed monthly fees but uneven usage,
- Claude for long text, Gemini for charts, GPT for planning, all in separate subscriptions,
- lab groups cannot easily track who used what,
- API keys, balances, and invoices are scattered.
That is why I prefer matching tools by:
long text -> Claude
PDF / chart -> Gemini
planning / structure -> GPT
code / reasoning -> DeepSeek / Kimi
Chinese writing -> Qwen / GLM
research diagrams -> GPT Image 2 / Qwen Image Max
With a multi-model entry point like TokenMix, these models can be used from one account on a pay-as-you-go basis. For researchers, the point is not finding one model that is always best. The point is avoiding fixed monthly costs for models you only need occasionally.
Before You Run Any Skill: Build a Research AI Input Pack
Many people think AI is unstable because the model is weak. In academic work, the more common reason is scattered input. Research tasks need context, boundaries, evidence, and output constraints. If you paste random fragments every time, the model has to guess the workflow.
I recommend preparing a reusable input pack:
| Input material | What it contains | Useful for |
|---|---|---|
| Project card | topic, field, object, keywords, scope, exclusions | Deep Research, Academic Paper |
| Literature list | DOI, title, year, method, finding, why it matters | Deep Research, Academic Paper |
| Manuscript draft | title, abstract, introduction, method, results, discussion | Academic Paper, Academic Paper Reviewer |
| Citation table | in-text citation, reference, supported claim | Academic Paper citation-check |
| Reviewer package | reviewer comments, response draft, revised excerpts | Academic Paper rebuttal-audit, Reviewer re-review |
| Reproduction package | paper, repo, README, config, error, metric gap | paper-context-resolver, DeepSeek, Kimi |
| MATLAB package | equations, variables, current code, error, expected output | MATLAB symbolic math / debugging scenario |
| Figure package | purpose, size, required elements, forbidden elements, text | gpt-image-2, Qwen Image Max |
A minimal project card:
Project card
Topic:
Field:
Research object:
Core question:
Keywords:
Scope:
Do not cover:
Current stage: topic exploration / literature review / draft / revision / reproduction
Target output:
Deadline:
The key benefit: models and Skills can change, but your context stays stable.
General Rule: Do Not Ask the Skill to “Write” Too Early
The most common academic AI mistake is asking for the final draft immediately.
A safer pattern:
Step 1: ask the model to identify task type and missing inputs
Step 2: let the Skill produce intermediate structure
Step 3: generate the final prose, table, figure, or revision
For example, instead of saying:
Write my related work.
Use:
I am preparing a related work section.
First, check whether my inputs are sufficient:
1. Is the research question clear?
2. Can the literature be grouped by method or debate?
3. Are key opposing papers missing?
4. Does each claim have citation support?
Do not write the prose yet. Return only missing inputs and a suggested workflow.
This prevents a lot of fluent but hollow academic writing.
Output Quality Gates for Each Skill
What a good Deep Research output should include
A strong deep-research output should not be just “10 paper summaries.” It should contain:
- refined research questions,
- literature clusters by theory, method, object, or finding,
- key debates and why papers disagree,
- methodology map,
- evidence strength,
- research gaps with boundaries,
- a must-read sequence.
Quality-gate prompt:
Audit the deep-research output.
Check whether it includes:
1. refined research questions
2. literature clusters
3. key debates
4. evidence strength
5. methodology map
6. research gaps
7. must-read sequence
Mark each item as pass / partial / missing.
If missing, tell me what input I need to provide.
What a good Academic Paper output should include
A strong academic-paper output should make the manuscript more publishable, not just more polished:
- each section has a clear function,
- introduction narrows from problem to gap to contribution,
- related work is a research map rather than stitched summaries,
- method contains reproducible details,
- results explain what metrics mean,
- discussion does not overclaim,
- citations support specific claims.
Quality-gate prompt:
Audit this Academic Paper output.
Check:
1. Does each section have a clear function?
2. Are claims proportional to evidence?
3. Are citations used to support specific claims?
4. Is the contribution explicit?
5. Are limitations honest and specific?
6. Does the revised text preserve my intended meaning?
Return a table: issue / severity / suggested fix.
What a good Academic Paper Reviewer output should include
Simulated review has two failure modes: vague praise and performative harshness.
A useful review should include:
- editorial decision tendency and rationale,
- major concerns that affect validity,
- minor concerns separated from fatal issues,
- methodology risks,
- contribution clarity,
- revision roadmap,
- confidence level and missing evidence.
Quality-gate prompt:
Audit the simulated peer review.
Check:
1. Are major concerns evidence-based?
2. Are minor comments separated from fatal issues?
3. Does the review identify methodology risks?
4. Does it give actionable revision steps?
5. Does it avoid asking for irrelevant extra work?
Return a reviewer-quality score from 1 to 5 and explain.
What a good paper-context-resolver output should include
This Skill is narrow by design. A good output is an evidence ledger, not a paper summary.
It should include:
- the exact reproduction question,
- paper evidence,
- repo / README / config evidence,
- conflict table,
- most likely reproduction setting,
- uncertainty,
- next minimal test.
Quality-gate prompt:
Audit the paper-context-resolver output.
It should not summarize the whole paper.
It should answer one reproduction-critical question.
Check:
1. Is the question narrow?
2. Does it cite paper evidence?
3. Does it cite repo/config evidence?
4. Does it record conflicts?
5. Does it separate certainty from hypothesis?
6. Does it suggest the next minimal test?
What a good gpt-image-2 output should include
Academic figures are not only about visual appeal. A useful generated figure should have:
- correct structure,
- readable labels,
- a single clear message,
- reusable style,
- no fabricated data.
Quality-gate prompt:
Evaluate this generated research figure.
Check:
1. Are all required elements present?
2. Is the flow direction correct?
3. Are text labels readable?
4. Is there any extra or hallucinated element?
5. Is the style appropriate for paper / slides / social post?
6. What should be edited manually before publishing?
Model Fallback: How to Switch When the First Model Fails
Model switching should not be random. Switch based on failure type:
| Failure type | Common symptom | Fallback strategy |
|---|---|---|
| Missing long-context details | appendix, tables, or limitations ignored | Claude Sonnet 5 / Gemini 2.5 Pro |
| Misreading charts | axes, legends, table meaning wrong | Gemini 2.5 Pro / Gemini 3.5 Flash |
| Hollow logic | many claims, weak reasoning chain | GPT-5.4 / DeepSeek V4 Pro |
| Broken code edits | too many changes, renamed variables | Kimi K2.7 Code / DeepSeek V4 Pro |
| Awkward Chinese prose | translated tone, unnatural phrasing | Qwen3.7 Max / GLM-5.2 |
| Too gentle as reviewer | praise without critique | Claude Opus 4.8 + Academic Paper Reviewer |
| Too harsh as reviewer | irrelevant extra demands | guided mode with limited criteria |
| Bad image text | wrong labels, distorted Chinese text | Qwen Image Max or manual layout |
A useful fallback instruction:
The previous model output failed because:
[specific failure]
Redo the task with a different emphasis:
- preserve original meaning
- fix only the failed part
- do not rewrite unrelated sections
- explain what changed and why
Lab / Research Group Workflow: Do Not Let Everyone Ask From Scratch
For a lab group, the worst pattern is every student asking from zero. A better pattern is shared AI-ready material:
lab_ai/
project_cards/
literature_matrix/
paper_drafts/
reviewer_comments/
reproduction_notes/
prompts/
figures/
Suggested shared tables
Literature matrix:
| Field | Meaning |
|---|---|
| paper_id | internal ID |
| citation | formatted citation |
| research question | what the paper answers |
| method | method |
| data | data |
| key finding | main finding |
| limitation | stated limitation |
| useful for | which section it supports |
| evidence strength | strong / medium / weak |
Reproduction note:
| Field | Meaning |
|---|---|
| paper | target paper |
| repo | official repository |
| question | narrow reproduction question |
| paper evidence | evidence from paper |
| repo evidence | evidence from repository |
| conflict | conflict point |
| chosen setting | final selected setting |
| next test | next experiment |
Reviewer response tracker:
| Field | Meaning |
|---|---|
| reviewer | R1 / R2 / R3 |
| comment id | comment number |
| issue type | method / writing / citation / experiment |
| response status | done / partial / pending |
| manuscript change | where it changed |
| evidence | supporting material |
The goal is to turn Skill outputs into lab assets, not disposable chat logs.
A One-Week Starter Plan
Day 1: Build your project card
Use GPT-5.4 to turn your research interest into three concrete research questions.
Day 2: Run Deep Research lit-review
Pick one question only. Ask for literature clusters, debates, and reading sequence.
Day 3: Use Gemini for PDFs and charts
Take figures, tables, and method sections from 3 to 5 key papers and let Gemini explain them.
Day 4: Use Academic Paper outline-only
Input the research question, literature map, and method. Ask only for structure, not full prose.
Day 5: Use Claude Sonnet 5 to revise one section
Revise one related-work section first. Require a change log.
Day 6: Use Academic Paper Reviewer methodology-focus
Ask it to check only methodology risks. Avoid generic peer review at this stage.
Day 7: Build your own research AI template
Record which models worked, which prompts were reliable, and which outputs needed verification.
Common Mistakes
- Treating a Skill as a model. A Skill is workflow; a model is execution capability.
- Treating model output as fact. Literature, citations, dataset splits, and configs must be verified against primary sources.
- Asking AI to write the whole paper in one step. Structure and evidence should come before prose.
- Asking reproduction questions that are too broad.
paper-context-resolveris best for narrow reproduction-critical gaps. - Publishing generated figures without checking labels, arrows, and logic.
- Using the most expensive model for every task. Many summaries, rewrites, translations, and formatting tasks can use lighter models.
- Using the cheapest model for every task. Reviewer responses, complex arguments, and methodology checks are the wrong places to underinvest.
Final Checklist
Before opening an AI tool, ask:
1. Is this a long-text task or a PDF/chart task?
2. Is this writing, review, revision, reproduction, or coding?
3. Is this a narrow reproduction detail or general code debugging?
4. Is there a real Skill that standardizes this workflow?
5. Does the task benefit from using multiple models on demand?
If you answer these five questions first, model selection becomes much easier.
Verified Skills and Sources
- Academic Research Skills: https://github.com/imbad0202/academic-research-skills
- Deep Research Skill: https://github.com/imbad0202/academic-research-skills/tree/main/deep-research
- Academic Paper Skill: https://github.com/imbad0202/academic-research-skills/tree/main/academic-paper
- Academic Paper Reviewer Skill: https://github.com/imbad0202/academic-research-skills/tree/main/academic-paper-reviewer
- paper-context-resolver: https://www.skills.sh/lllllllama/ai-paper-reproduction-skill/paper-context-resolver
- gpt-image-2: https://www.skills.sh/doany-ai/skills/gpt-image-2
- MathWorks MATLAB Symbolic Math Skills example: https://blogs.mathworks.com/matlab/2026/06/25/from-whiteboard-sketch-to-pareto-front-using-symbolic-math-skills-in-the-agentic-ai-playground/