25
qr-code

Talk notes: How to Design a Good API and Why It Matters, Bloch

Generally, as a coder grows towards building larger systems and platforms, heuristics and good judgment supplant the rigid rules of any particular language. In particular, building a system's API is a major design task, and Joshua Bloch's talk codifies key advice infused with real-world experience.

In the Google Tech Talk, Bloch advises Googlers that if they only remember two things from his talk, those points are:

  1. When in doubt, leave it out (of the API).
  2. Don't make the client do anything the module could do.

There's a slight bias towards Java and object-oriented code in his advice, but it easily carries over to other languages and functional programming (methods → functions, classes → types). Below, I've organized my notes by slide and added timestamps should you want to skip around in the video.

Why is API Design Important?

  • Can be core company asset; deeply connects customers/users to the company.
    • But badly designed API very costly.
  • Public APIs are "forever" - one chance to get it right.
  • For coders: you are already an API designer because good code is modular, each module has an API; if it's good, it gets re-used, a lot.
    • 4:15 And, thinking in terms of APIs improves code quality. API design and language design are very similar: creating a tool for coders to express intent to machine. Also, we create platforms: language AND libraries.

Characteristics of a Good API 5:50

  • Easy to learn
  • Easy to use, even without documentation
  • Hard to misuse
    • Should practically force you do the right thing.
  • Easy to read and maintain code that uses the API
  • Sufficiently powerful to satisfy requirements
    • Not necessarily absolutely powerful, just appropriately so.
  • Easy to extend/evolve
  • Appropriate to audience
    • Use right vocabulary and concepts.

I. The Process of API Design 7:45

  1. Gather requirements, with a healthy degree of skepticism. 7:50
    • Distinguish between proposed solutions and true requirements.
      • Better solutions than those first proposed may well exist.
    • True requirements - should be in form of use-cases.
    • It can actually be easier and more rewarding to build something more general.
  2. Start w/ short spec - 1 Page is ideal 9:58
    • Starting out, agility trumps completeness
      • Worst thing you can do is have 6 guys in a room working for 6 months on a spec.
    • Bounce 1-page spec off as many people as possible
    • Keep it short => easier to modify
    • Flesh out w/ actual coding - this necessarily involves coding to the API you're defining.
      • 14:40ish, he works through an old API and finds it's not quite intuitive.
  3. Write to your API early and often 15:03
    • Start before implementation, and before full/proper spec
      • Avoid throwing away implementation
    • Start before you've even specified it properly
      • Saves you from writing specs you'll throw away
    • Continue writing to API as you flesh it out
      • Prevent nasty surprises, and
    • Code lives on as examples/unit-tests - among the most important code you'll ever write
      • If you get this right, you "seed the market" with good examples early on.
      • "Example programs should be exemplary."
      • Rule of thumb "Spend 10x as much time on every line of example code as production code"
  4. Writing to SPI is even more important 17:25
    • SPI = Service Provider Interface, a plug-in interface enabling multiple implementations
      • e.g. Java Cryptography Extension
      • Hide very, very different implementations underneath the SPI.
    • Write multiple plug-ins before release
      • If you write one, it probably won't support another
      • If you write two, it will support more with difficulty
      • If you write three, it will work fine for any number
      • Will Tracz calls this "The Rule of Threes"
        • Confessions of a Used Program Salesman, 1995
    • Me: see Wikipedia http://en.wikipedia.org/wiki/Plug-in_(computing)
  5. Maintain realistic expectations 19:13
    • Most API designs are over-constrained
      • You won't be able to please everyone,
      • So aim to displease everyone equally. If all stakeholders are less than 100% happy, but happy enough, that's a good sign.
        • Don't mistake this for design-by-committee; you still need a strong visionary leader.
    • Expect to make mistakes - API will evolve with real-world usage/experience.
    • Block discusses his Collections API mistakes.

II. General Principles 21:47

  1. API should do one one thing and do it well 21:52
    • Functionality should be easy to explain
      • If it's hard to name, that's a warning sign
        • "The names is the API talking back to you, so listen carefully."
      • Good names drive good development and good design
        • e.g good names instantly communicate what they are: Font, Set, PrivateKey, Lock, ThreadFactory, TimeUnit, Future
        • e.g. bad names: DynAnyFactoryOperations, BindingIteratorImplBase, ENCODINGCDRENCAPS, OMGVMCID
      • Be amenable to splitting and merging modules
  2. API should be as small as possible but no smaller 24:19
    • When in doubt, leave it out: functionality, classes, methods, parameters, etc.
      • "If you only remember one thing from the talk, remember this!"
      • Because you can always add, but you can never remove.
    • Conceptual weight more important than bulk; look for good power-to-weight ratio.
      • "What's really important is the number of concepts."
      • Re-use interfaces to keep conceptual weight down.
      • "You want to be able to do a lot, without learning a lot."
  3. Implementation should not impact API 26:18
    • Implementation details
      • confuse users,
      • inhibits freedom to change
    • Be aware of what is an implementation detail
      • Over-specifying behavior of methods/functions
      • e.g. do not specify hash functions
      • All tuning parameters are suspect
    • Don't let implementation details leak into API 28:31
      • e.g. on-disk and on-the-wire formats; exceptions
  4. Minimize accessibility of everything 29:26
    • Make classes and members as private as possible
    • Public classes should have no public fields
      • With the exception of constants
    • Maximize information hiding [Parnas]
    • Minimize coupling
      • Allows modules to be used, understood, built, tested, and debugged independently.
  5. Names matter - API is a little language 30:28
    • Names should be largely self-explanatory
      • Idea: API is a little language
      • Avoid cryptic abbreviations
    • Be consistent: same word means same thing throughout API, and across APIs on the platform
    • Be regular: strive for symmetry
      • e.g. add/remove both defined
    • If you get it right, code should read like prose
           if (car.speed() > 2 * SPEED_LIMIT)
               speaker.generateAlert("Watch out for cops!");
      
  6. Documentation matters 32:22

    Reuse is something that is far easier to say than to do. Doing it requires both good design and very good documentation. Even when we see good design, which is still infrequently, we won't see the components reused without good documentation.

    — D. L. Parnas, Software Aging. Proceedings of 16th International Conference Software Engineering, 1994

    • Document religiously
      • Document every class, interface, method, constructor, parameter, and exception
        • Class: what an instance represents
        • Method: contract between method and its client: preconditions, postconditions, and esp. side-effects
        • Parameter: indicate units (e.g. MB, GB), form (e.g. XML?), ownership of object post-method
      • Document state space very carefully
        • Where/when/how does mutation happen?
  7. Consider performance consequences of API design decisions 35:33
    • Premature optimization may be bad, but you can't just ignore performance.
    • Bad decisions can limit performance
      • e.g. making type mutable; providing constructor instead of static factory; using implementation type instead of interface
    • Do not warp API to gain performance
      • Underlying performance issue will get fixed, but headaches will be with you forever.
      • Good design usually coincides with good performance
    • Effects of API design decisions on performance are real and permanent 36:42
      • e.g. Java AWT method that returns mutable Dimension object => millions of needless object allocations
  8. API must coexist peacefully with platform 38:07
    • Do what is customary
      • Obey standard naming conventions,
      • Avoid obsolete parameters and return types,
      • Mimic patterns in core APIs and the language itself
    • Use API-friendly features: e.g. generics, varargs, enums, default arguments
    • Know and avoid API traps and pitfalls: e.g. finalizers, public static final arrays

III. Class Design 39:43

  1. Minimize mutability 39:51
    • Classes should be immutable unless there's a good reason to do otherwise
      • Pros: simple, thread-safe, reusable
      • Only disadvantage: separate object for each value
    • If mutable: make them as immutable as possible
      • keep state-space small and well-defined; make it obvious when it's legal to call which method.
    • e.g. Date, Calendar (bad) vs. TimerTask (good) minimizes mutability
  2. Subclass only where it makes sense 42:01
    • Subclassing implies substitution (see Liskov substitution principle), so use only when "is-a" relationship exists.
      • Otherwise, use composition.
    • Public classes should not sublcass other public classes for ease of implementation.
    • e.g. bad: Properties extends Hashtable; or Stack extends Vector
      • Me: see how substitution does not hold for the above.
    • e.g. good: Set extends Collection
  3. Design and document for inheritance or else prohibit it 43:25
    • Inheritance violates encapsulation (per Snyder 1986): the subclass is sensitive to implementation details of its superclass.
    • If you allow subclassing, document self-use: how do methods use one another?
    • Conservative policy: all concrete classes final.
    • e.g. bad: many concrete classes in J2SE libraries "We got this wrong in many places."
    • e.g. good: AbstractSet, AbstractMap documented and designed for inheritance.

IV. Method Design

  1. Don't make the client do anything the module could do 45:12
    • This is the 2nd key takeaway from the talk.
    • Reduced need for boilerplate code,
      • Generally done via-and-pasted code.
      • Ugly, annoying, and error-prone
    • e.g. of W3C DOM API, lots of extra imports, exposed implementation; should've started with the use-case of a user wanting to print an XML document.
  2. Don't violate the principle of least astonishment 48:32
    • User of API should never be surprised by behavior:
      • It's worth extra implementation effort
      • It's even worth reduced performance to adhere to this.
      • e.g. Thread interruption
  3. Fail fast: report errors as soon as possible after they occur 49:52
    • Compile time is best - static typing, generics
    • At runtime, first bad method invocation is best
      • fail early with garbage input
      • Method should be failure-atomic.
  4. Provide programmatic access to all data available in string form 51:54
    • Otherwise, clients will parse strings, which is painful and worse still, turns the string format into de facto API.
    • So anything returned as string, also have the ability to return that data in programmatic form.
    • e.g. of Java stacktrace initially only via string, which people did parse
  5. Overload with care 53:13
    • Avoid ambiguous overloadings:
      • Multiple overloadings applicable to same actuals
      • Conservative, no two with same number of args
    • Don't do it just because you can; often better to use a different name.
    • If you must provide ambiguous overloadings, ensure same behavior for same arguments.
  6. Use appropriate parameter and return types 54:15
    • Favor interface types over classes for input: provides flexibility, performance.
    • Use most specific possible input parameter type: moves error from runtime to compile time.
    • Don't use string if a better type exists: strings are cumbersome, error-prone, and slow. e.g. stuff from the web, just because it starts as a string don't leave it as such!
    • Don't use floating point for monetary values: binary f.p. causes inexact results!
    • Use double (64bits) rather than float (32 bits): precision loss is real, while performance loss is negligible.
  7. Use consistent parameter ordering across methods 55:57
    • Especially important if parameter types are identical
    • e.g. java.util.Collections: first param is always the collection to be modified/queried
    • e.g. java.util.concurrent: time always specified as long delay, TimeUnit unit
  8. Avoid long parameter lists 57:05
    • 3 or fewer parameters is ideal; more and users must consult docs.
    • Long lists of identically typed params harmful: easy to transpose params by mistake, and programs still compile and run but screw up!
    • Two ways to shorten parameter lists:
      1. Break up method
      2. Create helper class to hold parameters, e.g. Builder pattern
  9. Avoid return values that demand exceptional processing 58:09
    • e.g. return zero-length array or empty collection, instead of null

V. Exception Design (Bloch ran out of time)

  1. Throw exceptions to indicate exceptional conditions
    • Don't force client to use exceptions for control flow.
    • Don't fail silently.
  2. Favor unchecked exceptions
    • Checked => client must take recovery action(s)
      • Overuse of checked exceptions causes boilerplate
    • Unchecked => programming error
  3. Include failure-capture information in exceptions
    • Allow diagnosis and repair or recovery
    • For unchecked exceptions, message suffices
    • For checked exceptions, provide accessors
    • Me: unclear what this means, but Bloch covers this more in his book, Effective Java

VI. Refactoring API Designs (Bloch ran out of time)

  1. e.g. Sublist operations in a class "Vector"
    • starting methods:
      • indexOf(Object elem, int index)
      • lastIndexOf(Object elem, int index)
      • Not very powerful, supports only searching; hard to use w/o documentation.
    • Refactor
      • public interface List, with method subList(int fromIndex, int toIndex)
      • extremely powerful, supports all operations
      • use of interface reduces conceptual weight, high power-to-weight
      • easy to use w/o documentation
  2. e.g. Thread-local variables

Conclusion

  • API design is a noble and rewarding craft: improves lives of programmers, end-users, companies.
  • This talk covered some heuristics of the craft: don't slavishly follow, but don't violate them without good reason.
  • API design is tough: not a solitary activity; perfection not achievable, but try anyway.

The Bumper-sticker Version

From a post on InfoQ, Dr. Bloch condenses his advice to a 1-pager (or close to it). Note that I've added numbering to his maxims, otherwise what follows is entirely his.

My conference session How to Design a Good API and Why it Matters has always drawn large crowds; on InfoQ was the third most viewed content last year. When I presented this session as an invited talk at OOPSLA 2006, I was given the opportunity to write an abstract for the proceedings. In place of an ordinary abstract I decided to try something a bit unusual: I distilled the essence of the talk down to a modest collection of pithy maxims, in the spirit of Jon Bentley's classic Bumper-Sticker Computer Science, Item 6 in his excellent book, More Programming Pearls: Confessions of a Coder (Addison-Wesley, 1988).

It is my hope that these maxims provide a concise summary of the key points of API design, in easily digestible form:

  1. All programmers are API designers. Good programs are modular, and intermodular boundaries define APIs. Good modules get reused.
  2. APIs can be among your greatest assets or liabilities. Good APIs create long-term customers; bad ones create long-term support nightmares.
  3. Public APIs, like diamonds, are forever. You have one chance to get it right so give it your best.
  4. APIs should be easy to use and hard to misuse. It should be easy to do simple things; possible to do complex things; and impossible, or at least difficult, to do wrong things.
  5. APIs should be self-documenting: It should rarely require documentation to read code written to a good API. In fact, it should rarely require documentation to write it.
  6. When designing an API, first gather requirements—with a healthy degree of skepticism. People often provide solutions; it's your job to ferret out the underlying problems and find the best solutions.
  7. Structure requirements as use-cases: they are the yardstick against which you'll measure your API.
  8. Early drafts of APIs should be short, typically one page with class and method signatures and one-line descriptions. This makes it easy to restructure the API when you don't get it right the first time.
  9. Code the use-cases against your API before you implement it, even before you specify it properly. This will save you from implementing, or even specifying, a fundamentally broken API.
  10. Maintain the code for uses-cases as the API evolves. Not only will this protect you from rude surprises, but the resulting code will become the examples for the API, the basis for tutorials and tests.
  11. Example code should be exemplary. If an API is used widely, its examples will be the archetypes for thousands of programs. Any mistakes will come back to haunt you a thousand fold.
  12. You can't please everyone so aim to displease everyone equally. Most APIs are overconstrained.
  13. Expect API-design mistakes due to failures of imagination. You can't reasonably hope to imagine everything that everyone will do with an API, or how it will interact with every other part of a system.
  14. API design is not a solitary activity. Show your design to as many people as you can, and take their feedback seriously. Possibilities that elude your imagination may be clear to others.
  15. Avoid fixed limits on input sizes. They limit usefulness and hasten obsolescence.
  16. Names matter. Strive for intelligibility, consistency, and symmetry. Every API is a little language, and people must learn to read and write it. If you get an API right, code will read like prose.
  17. If it's hard to find good names, go back to the drawing board. Don't be afraid to split or merge an API, or embed it in a more general setting. If names start falling into place, you're on the right track.
  18. When in doubt, leave it out. If there is a fundamental theorem of API design, this is it. It applies equally to functionality, classes, methods, and parameters. Every facet of an API should be as small as possible, but no smaller. You can always add things later, but you can't take them away. Minimizing conceptual weight is more important than class- or method-count.
  19. Keep APIs free of implementations details. They confuse users and inhibit the flexibility to evolve. It isn't always obvious what's an implementation detail: Be wary of overspecification.
  20. Minimize mutability. Immutable objects are simple, thread-safe, and freely sharable.
  21. Documentation matters. No matter how good an API, it won't get used without good documentation. Document every exported API element: every class, method, field, and parameter.
  22. Consider the performance consequences of API design decisions, but don't warp an API to achieve performance gains. Luckily, good APIs typically lend themselves to fast implementations.
  23. When in Rome, do as the Romans do. APIs must coexist peacefully with the platform, so do what is customary. It is almost always wrong to transliterate an API from one platform to another.
  24. Minimize accessibility; when in doubt, make it private. This simplifies APIs and reduces coupling.
  25. Subclass only if you can say with a straight face that every instance of the subclass is an instance of the superclass. Exposed classes should never subclass just to reuse implementation code.
  26. Design and document for inheritance or else prohibit it. This documentation takes the form of selfuse patterns: how methods in a class use one another. Without it, safe subclassing is impossible.
  27. Don't make the client do anything the library could do. Violating this rule leads to boilerplate code in the client, which is annoying and error-prone.
  28. Obey the principle of least astonishment. Every method should do the least surprising thing it could, given its name. If a method doesn't do what users think it will, bugs will result.
  29. Fail fast. The sooner you report a bug, the less damage it will do. Compile-time is best. If you must fail at run-time, do it as soon as possible.
  30. Provide programmatic access to all data available in string form. Otherwise, programmers will be forced to parse strings, which is painful. Worse, the string forms will turn into de facto APIs.
  31. Overload with care. If the behaviors of two methods differ, it's better to give them different names.
  32. Use the right data type for the job. For example, don't use string if there is a more appropriate type.
  33. Use consistent parameter ordering across methods. Otherwise, programmers will get it backwards.
  34. Avoid long parameter lists, especially those with multiple consecutive parameters of the same type.
  35. Avoid return values that demand exceptional processing. Clients will forget to write the specialcase code, leading to bugs. For example, return zero-length arrays or collections rather than nulls.
  36. Throw exceptions only to indicate exceptional conditions. Otherwise, clients will be forced to use exceptions for normal flow control, leading to programs that are hard to read, buggy, or slow.
  37. Throw unchecked exceptions unless clients can realistically recover from the failure.
  38. API design is an art, not a science. Strive for beauty, and trust your gut. Do not adhere slavishly to the above heuristics, but violate them only infrequently and with good reason.
blog comments powered by Disqus