Read 900+ pages in
90 seconds.
Upload any book PDF — your novel, scripture, textbook or research — and our ultra-pro algorithm distills the Core Soul: index, concepts, rules, smart-ways and up to 201+ key sentences. 100% client-side. Zero upload to any server.
of Mind
⚙️ The Soul Engine
Drop a book PDF. The engine streams pages, tokenizes characters, builds an inverted-index TF-IDF graph, runs sparse-ANN TextRank with MMR re-ranking, and surfaces only what truly matters. Tested up to 900+ pages.
Drop your Book PDF here
or click to browse · Max 900+ pages · Stays on your device
🧬 The 7-Stage Pipeline
Each book passes through a deterministic pipeline. No black-box AI hallucination — only verifiable graph mathematics.
Stream & Sanitize
PDF.js streams pages in batches of 8 with cleanup() to keep memory flat. We strip headers, footers, page numbers, and OCR noise.
PDF.jsStreamingSentence & Chapter Cut
Regex-driven sentence segmentation respects abbreviations (Dr., e.g., i.e.). Chapter detection finds "Chapter N", roman numerals, and ALL-CAPS headings.
NLPRegexToken + IDF Build
Stop-word filtered tokenization → inverted index. IDF = log((N+1)/(df+1))+1 (smooth Laplace). Top-K tokens per sentence become its sparse vector.
TF-IDFSparsePre-Filter (≥ 2.5k sentences)
For mega-books (700+ pages), we score every sentence by IDF mass × length-fit × positional weight, then keep top 2500 candidates — without losing intros & conclusions.
HeuristicANN-TextRank (PageRank)
Cosine similarity computed only between candidates sharing top tokens. Power iteration with damping d=0.85 converges in ~30 iters. Complexity ≈ O(N·k) instead of O(N²).
GraphPageRankMMR Re-Rank
Maximal Marginal Relevance picks sentences that maximize importance × (1−λ·redundancy). λ slider lets you trade focus vs novelty.
MMRDiversityLens Routing
Concepts / Rules / Smart-Ways are extracted by trigger-lexicons + dependency cues, then re-scored against the global graph. Index/TOC is detected via heading patterns.
LexiconRouting📐 The PhD Mathematics
For the curious mind — every formula powering the engine, written in IIT/MSc rigor.
Term Weight
w(t,s) = tf(t,s) · log( (N+1) / (df(t)+1) ) + 1
Smooth IDF prevents division by zero on rare tokens — crucial when a book has unique mantras or domain jargon.
Sentence Similarity
sim(s_i, s_j) = ⟨v_i, v_j⟩ / (‖v_i‖ · ‖v_j‖)
Computed on sparse top-K vectors — typically K=12 — making the inner product run in O(K) per pair.
PageRank Iteration
PR(s_i) = (1−d)/N + d · Σ_{j∈In(i)} sim(i,j)/Σ_k sim(j,k) · PR(s_j)
d = 0.85. Sparse neighbour lists make it converge in ≈30 iterations even for 30k sentences.
Diversity Selection
MMR(s) = arg max [ λ·PR(s) − (1−λ)·max_{r∈R} sim(s,r) ]
λ slider = 0.30 (very diverse) → 0.95 (laser-focused). Default 0.72 = Soul Mode.
Beta-Prior Bias
π(p) = 1 + α·Beta(p; 0.6, 0.6)
U-shaped curve gently boosts intros and conclusions where authors traditionally place thesis statements.
Soul Score
S(s) = z(PR) + 0.4·z(IDF-mass) + 0.2·π(p) − 0.3·z(redundancy)
Normalised z-scores keep the metric stable across 80-page essays and 900-page tomes alike.
🌌 The Master Theory of Clarity
Only 1% of humans truly understand this. Once you do — no problem in life will catch you unprepared again. Read in order.
Comments
Post a Comment