In your Semantics Matter talk, you expressed some doubt that dynamic languages allow you to provide meaning to code and therefore we should prefer languages like C# and Java because the compiler-time type checking provides a safety net. As someone who’s developed systems using C#, Java (1.4) and Python, I found myself feeling uneasy about the suggestion. I spoke to you briefly in the bar before I had to leave and spoke about APIs and you suggested types help guide less experienced developers. Unfortunately I didn’t have more time to speak to you, but these are the thoughts I had about it on the way home.
I realise my experience of Java isn’t as recent as yours, but I hope you take the examples provided in the spirit of examples rather than best practice.
I can see how types aid developers to understand an API, and I can see the benefits of strongly types rather than “stringly” typed interfaces. My problem with that is typing isn’t nearly enough, as you showed with a number of Data and string examples, and we know we have to test, and add contracts to ensure that the value passed not only matches the type, but isn’t null, that an ID key matches a known value, that a credit card number is the right length and passes the checksum, and a number of other validations that typing doesn’t capture, because the developer has to layer meaning onto the data.
Typing hides meaning. It beats semantics into a corner – UTF8-encoded text or ASCII-encoded text or object Identifiers can are all be strings, and their meaning is subsumed into their type.But the only time we can identify a type is in the run time context : because the user typed something in the Address field, rather than the Postcode field. It doesn’t have a semantic type and it doesn’t have meaning until we validate it. In this way, typing lulls you into a false sense of security. XML provides slightly better types – XML is strict and allows you to enforce regex, length and other constraints that make sense for that type, but it doesn’t solve the full problem – it still can’t tell you if an email or postcode are valid – they only have a semantic meaning as a locator once they’ve located you.
More than that, types get in the way of meaning. Consider Java 1.4 and the horrible pre-generics casting that stripped all meaning from objects in a list (which I admit has been mostly resolved now with generics and array covariance, but they have their own problems). This casting also removed any type information. Even with generics, we need to define IComparer before we can sort, which helps to understand what the sort algorithm does, but loses the clarity of “but I can compare that, why can’t I sort it”, because a type that can be compared may not inherit the right parent. This is one particular strength of Python’s duck typing because it removes the need to define an interface and relies on the language semantics, allowing the logic to be expressed independently of the framework you need to define. I remember this being a particular problem when I wanted to pass strongly-typed functions into a genetic algorithm library I was building in C++ : the amount of scaffolding required to define the callbacks hid the meaning of the underlying problem I was trying to solve, whereas the Python version, by removing the typing, and moving the validation into the callee, greatly simplified the interface.
The semantics of a type are at the wrong level to convey business meaning. I want a number to represent 1p, 1 Euro cent or the UK national debt. I shouldn’t have to care if that’s an Integer, a BigInteger or a Decimal, so long as it can store it, and not fail on overflow or division. Making me care about the hardware means I mix the semantics of the business logic and the implementation, driving me to hold 2 semantic models in my head, making the StackOverflow mistakes you highlighted more likely. In Python, the model fails under division but otherwise keeps out of my way. The dynamic typing of Python means that it can automatically change a number from integer to BigInteger if the operation would otherwise fail, isolating my code from the hardware implementation, and allowing the interpreter to do the job of using the most efficient representation, in the same way that HotSpot can compile certain objects onto the stack where it improves performance, removing implementation details out of the developer’s head and into the compiler, allowing the develop to focus on business meaning.
I’m not writing a compiler, I don’t want to care about the semantics of the hardware because I can’t guarantee them. If I write an Android app, do I care if my hardware is little endian or big endian? No. So why should I care if a number is 32 or 64 bit. In Python, I don’t need to care, in C# it matters. And that’s wrong.
The types that matter aren’t the types the compiler checks. A max function won’t tell you that you can’t compare two ids, or two parsed credit card numbers. Hungarian notation used to define purpose helps (screenWidth, screenX and viewportWidth, viewportX for example) but without a language like Ada or encapsulating types into non-castable forms, you cannot rely on the compiler to enforce it. You may as well be using defaults or string typing, and I know how much you love those.
Is it better to have “something” than “nothing”? Surely having static types is safer than dynamic types? You start with something, you know what it is, and you can deal with it. Only you don’t know what it is. Surely it’s cleaner and clearer if a variable of a less specific type is replaced by a more specific type once it has been validated? You can do it by creating new variables, but overwriting keeps the intent clear and prevents the use of the invalid form once the validation has been done. To replicate the behaviour in C# or Java, we’d have to encapsulate the target type in a class and use a constructor to validate, and use hungarian notation to distinguish between the raw and validated forms.
If you care about semantics, static typing can get in the way, but that’s just my view.