| Authors | Benet, Juan |
|---|---|
| Year | 2014 |
| Project | IPFS |
| License | MIT |
| Official Source | https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf |
This page is an educational summary and analysis of an official whitepaper or technical paper, written for reference purposes. It is not a verbatim reproduction. CryptoGloss does not claim authorship of the original work. All intellectual property rights remain with the original author(s). The official document is linked above.
“IPFS — Content Addressed, Versioned, P2P File System” is the 2014 paper by Juan Benet of Protocol Labs introducing the InterPlanetary File System (IPFS) — a peer-to-peer distributed file system that replaces location-based addressing (URLs) with content-based addressing (cryptographic hashes). Under IPFS, you request data by what it is (its hash) rather than where it lives (its server). If the content hasn’t changed, the address hasn’t changed — making links permanent by construction.
IPFS synthesizes four pre-existing distributed systems:
- Kademlia DHT — peer discovery and routing
- BitTorrent — incentivized peer-to-peer file exchange (via Bitswap)
- Git — Merkle DAG-versioned content history
- Self-Certifying File System (SFS) — cryptographic naming
> Paper: Available at ipfs.io.
Publication and Context
In 2014, the dominant model for content distribution was centralized HTTP servers. HTTP addresses content by location — if the server goes down, the link breaks (link rot). Large files are served from single points of failure; popular content creates bandwidth bottlenecks on individual servers.
BitTorrent had demonstrated that peer-to-peer content distribution was more robust and efficient for large files, but it lacked a universal addressing scheme and had no built-in support for small files, versioning, or semantic naming.
IPFS proposed a unified protocol that could work for any file, at any scale, with permanent addresses.
Content Identifiers (CIDs)
Every object in IPFS has a Content Identifier (CID) — a cryptographic hash of its content:
For a file:
CID = multihash(content)
For a directory or structured object:
CID = multihash(serialized_MerkleDAG_node)
Consequences of content addressing:
- Two identical files have the same CID everywhere in the network — automatic deduplication
- If a CID is valid, the content is exactly what you requested — automatic integrity verification
- Content never changes at a given CID — immutable by design; updates require new CIDs
Merkle DAG
IPFS represents data as a Merkle Directed Acyclic Graph (Merkle DAG) — a generalization of Git’s commit graph and Merkle trees:
- Any structured data (directories, files, linked objects) is represented as a DAG where each node’s CID is derived from its children’s CIDs
- Changes to any part of the structure produce new CIDs for all ancestor nodes (like Git commits)
- The structure is self-authenticating: verifying the root CID verifies the entire subtree
Kademlia DHT — Finding Peers
To actually retrieve a file given a CID, IPFS must find which peers have it. IPFS uses a Kademlia Distributed Hash Table for peer routing:
- Each node has a Node ID (a random hash)
- The DHT maps CIDs to the set of peers that claim to have the content
- Routing uses XOR distance metric — O(log n) hops to find a peer for any CID in a network of n peers
Bitswap — Exchange Protocol
Once peers are found, IPFS uses Bitswap (inspired by BitTorrent) for data exchange:
- Nodes maintain a want list (blocks they need) and a have list (blocks they can offer)
- Exchange is negotiated peer-to-peer; nodes preferentially serve peers who have served them
- Unlike BitTorrent, Bitswap is not tied to a specific torrent/file — it’s a general block exchange protocol
Filecoin integration: Bitswap’s informal social reciprocity is not strong enough for guaranteed storage retrieval. Filecoin (the companion Protocol Labs project) adds formal cryptographic incentives and storage proofs for persistent file storage.
IPNS: Mutable Naming Layer
CIDs are immutable by design. For mutable applications (websites that update, user profiles), IPFS provides IPNS (InterPlanetary Name System):
- A user generates a public/private key pair
- They sign a record mapping their public key (used as their “IPNS name”) to a CID
- Anyone can resolve the IPNS name to the current CID
- Updating the record signs a new CID with the same key
IPNS trades IPFS’s content-addressing permanence for mutability via key-based identity.
Reality Check
IPFS is widely deployed and genuinely used for content addressing in the blockchain ecosystem (NFT metadata, ENS content records, Filecoin retrieval). However:
- Persistence is not guaranteed: IPFS does not ensure data is retained. Content disappears if no node “pins” it. Users often run their own pinning services or pay third-party pinning services (Pinata, Web3.Storage/Storacha).
- Retrieval performance: Finding and downloading content from IPFS is slower than HTTP for small files, especially for rarely-accessed content. DHT lookups add latency.
- NFT metadata reliability: Many “NFT on IPFS” implementations store metadata at IPFS CIDs that no one is pinning — effectively making the metadata inaccessible without a centralized gateway fallback.
- Draft status: The 2014 document is explicitly a draft. The IPFS spec has evolved substantially; the draft doesn’t describe modern IPFS (libp2p, CID v1, QUIC transport, etc.).
Legacy
IPFS is the most widely adopted content-addressed storage system in the blockchain ecosystem, used by Ethereum (ENS), Filecoin (retrieval layer), NFT platforms, and decentralized web hosting projects. The content-addressing concept has influenced blockchain data availability designs (Celestia, EigenDA) and decentralized CDN systems.
Related Terms
Research
- Benet, J. (2014). IPFS — Content Addressed, Versioned, P2P File System. Protocol Labs. arXiv:1407.3561.
— Primary whitepaper (draft). Section 3 defines the IPFS stack; Section 4 covers content routing and Bitswap.
- Maymounkov, P., & Mazieres, D. (2002). Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. IPTPS 2002.
— Kademlia DHT; the peer discovery layer underlying IPFS’s content routing.
- Protocol Labs. (2017). Filecoin: A Decentralized Storage Network. Protocol Labs.
— The companion protocol adding cryptographic persistence guarantees to IPFS’s content addressing.