No. All of this is breaking the primary rule of programming: KISS (keep it simple, stupid). Don't add unnecessary complexity. Avoid premature optimization. Tons of things are correctly booleans and should stay that way.
Turning boolean database values into timestamps is a weird hack that wastes space. Why do you want to record when an email was verified, but not when any other fields that happen to be strings or numbers or blobs were changed? Either implement proper event logging or not, but don't do some weird hack where only booleans get fake-logged but nothing else does.
Should booleans turn into enums when a third mutually-exclusive state gets added? Yes, of course, so go refactor, easy. But don't start with an enum before you need it. The same way we don't start with floats rather than ints "just in case" we need fractional values later on.
Booleans are a cornerstone of programming and logic. They're great. I don't know where this "booleans are bad" idea came from, but it's the opposite of communicating intention clearly in code. That boolean should probably stay a boolean unless there's an actual reason to change it.
lock1 16 hours ago [-]
Disagree. Given the current popularity of dynamic languages and the fact that many people don't understand the value of ADT, newtype pattern, C-like enum even in static languages, I'd argue booleans & primitives are way overused.
I think a lot of people misunderstand KISS, believing everything should be primitives or surface-level simplicity. Instead, I interpret "simple" not something like golang's surface-level readability, but infosec's "principle of least privilege". Pick the option that minimizes possible state and capture the requirement logic, rather than primitives just because they're "simple" or "familiar".
Even then, sometimes it's fine to violate it. In this case, (nullable) date time might be more preferable than boolean for future-proofing purposes. It's trivial to optimize space by mapping date time to boolean, while it's a total pain to migrate from boolean to date time.
Also, doesn't "... a weird hack that wastes space" contradict "Avoid premature optimization"?
crazygringo 10 hours ago [-]
> Pick the option that minimizes possible state and capture the requirement logic
Which is what booleans do when the requirement is two states.
> Also, doesn't "... a weird hack that wastes space" contradict "Avoid premature optimization"?
No, because including the timestamp "just in case" is the premature optimization.
operator-name 15 hours ago [-]
I strongly disagree. Making invalid state unrepresentable is important.
> wastes space
> premature optimisation
A timestamp is a witness of when the email was verified. Since if they’ve verified can be calculated from it, having both is not only redundant but allow invalid states to be represented.
Cases like email verified are often followed by the need to know when. Say an expiry system. Going from bools, you are faced with the hard choice of how to migrate existing state.
Databases also warrant more care as a source of persistent state - accessed by multiple versions of your software. If you don’t have this persistency, then it matters less.
> any other fields that happen to be strings or numbers or blobs were changed
> implement proper event logging
Event logging is orthogonal to your database state. If your business logic needs dirty flags or timestamps they should be stored in the database, not queried.
And if you do need it for other fields, adding the bool is the perfect time to ask yourself if what you need is a timestamp.
> way we don't start with floats rather than ints "just in case" we need fractional values later on
Floats are a subset of int, and a natural migration. A bool can be calculated from a timestamp, but not the other way.
crazygringo 9 hours ago [-]
> Cases like email verified are often followed by the need to know when.
And often not. That's the point. Avoid premature optimization. (FWIW, I've never encountered a system in my life where a successful email verification then expired after a period of time.)
> having both is not only redundant but allow invalid states to be represented.
That's a different topic. That's about what to do when you know you need the timestamp. The article is about using a timestamp when you don't have a timestamp requirement.
> they should be stored in the database, not queried.
I don't know what that means. Everything in the database is queried. And you can store your events in the database, in one or more event log tables.
> Floats are a subset of int, and a natural migration.
I think you meant to say the opposite, but even that's not true because of precision. And so too are enums a natural migration from booleans. That's the point -- start simple and extend as needed.
glxxyz 22 hours ago [-]
Yes the advice in TFA was brought to you by the sort of people who never finish anything because they're always wasting time thinking about potential future use cases that will never happen. Make it simple and extensible and make it satisfy today's requirements.
breadwinner 23 hours ago [-]
Disagree. KISS is for bigger things like architecture. Exposing an enum instead of a simple bool is a good idea that will save you time later. The only time to not do this is if you're exposing internal info, i.e., breaking encapsulation.
nostrademons 22 hours ago [-]
It saves you time until you realize that those status flags are orthogonal. It's very common for a job to be both is_started and is_queued, for example. And a simple is_failed status enum is problematic once you add retries, and can have a failed job enter the queue to be started again.
KISS, YAGNI, and then actually analyze your requirements to understand what the mature schema looks like. A boolean is the simplest thing that can possibly work, usually. Do that first and see how your requirements evolve, then build the database schema that reflects your actual requirements.
msgodel 22 hours ago [-]
Yeah it might be better to think of booleans as "the smallest possible integer type" and use enums (or whatever your language has) to represent more meaningful data.
Although it always depends on what exactly you're really doing.
scuderiaseb 14 hours ago [-]
Depends, if you're absolutely sure whatever you have has just two states. Then by all means, use a boolean. But booleans are harder to read than enum values and you're skipping that because of saving a few bytes on the DB?
This is not premature optimization, sometimes booleans can be extremely hard to change so it's not as easy as "just refactor".
OskarS 1 days ago [-]
A piece of advice I read somewhere early in my career was "a boolean should almost never be an argument to a function". I didn't understand what the problem was at the time, but then years later I started at a company with a large Lua code-base (mostly written by one-two developers) and there were many lines of code that looked like this:
serialize(someObject, true, false, nil, true)
What does those extra arguments do? Who knows, it's impossible without looking at the function definition.
Basically, what had happened was that the developer had written a function ("serialize()", in this example) and then later discovered that they wanted slightly different behaviour in some cases (maybe pretty printed or something). Since Lua allows you to change arity of a function without changing call-sites (missing arguments are just nil), they had just added a flag as an argument. And then another flag. And then another.
I now believe very strongly that you should virtually never have a boolean as an argument to a function. There are exceptions, but not many.
account42 1 days ago [-]
But this isn't really a boolean problem - even in your example there is another mistery argument: nil
And you can get the same problem with any argument type. What do the arguments in
copy(obectA, objectB, "")
mean?
In general, you're going to need some kind of way to communicate the purpose - named parameters, IDE autocomplete, whatever - and once you have that then booleans are not worse than any other type.
OskarS 1 days ago [-]
You're correct in principle, but I'm saying that "in practice", boolean arguments are usually feature flag that changes the behavior of the function in some way instead of being some pure value. And that can be really problematic, not least for testing where you now aren't testing a single function, you're testing a combinatorial explosions worth of functions with different feature flags.
Basically, if you have a function takes a boolean in your API, just have two functions instead with descriptive names.
hamburglar 1 days ago [-]
> Basically, if you have a function takes a boolean in your API, just have two functions instead with descriptive names.
Yeah right like I’m going to expand this function that takes 10 booleans into 1024 functions. I’m sticking with it. /s
OrderlyTiamat 1 days ago [-]
If your function has a McCabe complexity higher than 1024, then boolean arguments are the least of your problems...
crazygringo 21 hours ago [-]
Not really.
Tons of well-written functions have many more potential code paths than that. And they're easy to reason about because the parameters don't interact much.
Just think of plotting libraries with a ton of optional parameters for showing/hiding axes, ticks, labels, gridlines, legend, etc.
The latter is how you should use such a function if you can't change it (and if your language allows it).
If this was my function I would probably make the parameters atrributes of an TurboEncabulator class and add some setter methods that can be chained, e.g. Rust-style:
Hopefully you could refactor it automatically into 1024 functions and then find out that 1009 of them are never called in the project, so you can remove them.
hamburglar 6 hours ago [-]
I think you might have missed the “/s”
8-prime 1 days ago [-]
True, but I think its worth noting that inferring what a parameter could be is much easier if its something other than a boolean.
You could of course store the boolean in a variable and have the variable name speak for its meaning but at that point might as well just use an enum and do it proper.
For things like strings you either have a variable name - ideally a well describing one - or a string literal which still contains much more information than simply a true or false.
nomel 21 hours ago [-]
If you language doesn't support named arguments, you can always name the value, with the usual mechanism:
debug_mode=True
some_func(..., debug_mode)
atoav 17 hours ago [-]
Well, that just means the function might be named wrong?
copy_from_to_by_key(objectA, objectB, "name")
Or, much better, you use named parameters, if your language supports it:
Or you could make it part of object by declaring a method that could be used like this:
objectB.set_value_from(objectA, key="name")
StopDisinfo910 1 days ago [-]
Named arguments are a solution to precisely this issue. With optional arguments with default value, you get to do precisely what was being done in your Lua code but with self documenting code.
I personally believe very strongly that people shouldn’t use programming languages lacking basic functionalities.
lukan 22 hours ago [-]
Or not use them without tooling?
I believe IDE's had the feature of showing me the function header with a mouse hover 20+ years ago.
fireflash38 12 hours ago [-]
It amazes me the contortions that Golang devs (me included) go through to get something approaching the keyword arguments from python.
It's honestly what I miss the most about python: keyword args, keyword-only args, positional-only args.
What I don't miss is the unholy abomination of *kwargs...
kevin_thibedeau 22 hours ago [-]
You can also document the argument name inline for languages with block comments but no named args.
StopDisinfo910 15 hours ago [-]
You can but it’s not the same because the names of named arguments are also present at the call site.
_dain_ 1 days ago [-]
Named arguments don't stop the deeper problem, which is that N booleans have 2^N possible states. As N increases it's rare for all those combinations to be valid. Just figuring out the truth table might be challenging enough, then there's the question of whether the caller or callee is responsible for enforcing it. And either way you have to document and test it.
Enums are better because you can carve out precisely the state space you want and no more.
fluoridation 1 days ago [-]
That's not a problem per se. It may very well be that you're configuring the behavior of something with a bunch of totally independent on/off switches. Replacing n booleans with an enum with 2^n values is just as wrong as replacing a 5-valued enum with 3 booleans that cannot be validly set independently.
arethuza 1 days ago [-]
If you use keyword arguments then something like that doesn't look too bad:
serialize(someObject, prettyPrint:true)
NB I have no idea whether Lua has keyword arguments but if your language does then that would seem to address your particular issue?
OskarS 1 days ago [-]
Lua doesn't directly support keyword arguments, but you can simulate it using tables:
serialize(someObject, { prettyPrint = true })
And indeed that is a big improvement (and commonly done), but it doesn't solve all problems. Say you have X flags, then there's 2^X different configurations you have to check and test and so forth. In reality, all 2^X configurations will not be used, only a tiny fraction will be. In addition, some configurations will simply not be legal (i.e. if flag A is true, then flag B must be as well), and then you have a "make illegal states unrepresentable" situation.
If the tiny fraction is small enough, just write different functions for it ("serialize()" and "prettyPrint()"). If it's not feasible to do it, have a good long think about the API design and if you can refactor it nicely. If the number of combinations is enormous, something like the "builder pattern" is probably a good idea.
It's a hard problem to solve, because there's all sorts of programming principles in tension here ("don't repeat yourself", "make illegal states unrepresentable", "feature flags are bad") and in your way of solving a practical problem. It's interesting to study how popular libraries do this. libcurl is a good example, which has a GAZILLION options for how to do a request, and you do it "statefully" by setting options [1]. libcairo for drawing vector graphics is another interesting example, where you really do have a combinatorial explosion of different shapes, strokes, caps, paths and fills [2]. They also do it statefully.
Whatever language you are using, it probably has some namespaced way to define flags as `(1 << 0)` and `(1 << 1)` etc.
arethuza 1 days ago [-]
If you really need all of that I think I'd go with a separate object holding all of the options:
options = new SerializeOptions();
options.PrettyPrint = true;
options.Flag2 = "red"
options.Flag3 = 27;
serialize(someObject, options)
vanviegen 1 days ago [-]
So 1 line of C/C++ becomes 5 lines of Java/C#? That sounds about right! :-) Though I'm sure we can get to 30 if we squeeze in an abstract factory or two!
wallstop 18 hours ago [-]
You can do the above in C#, I haven't written Java in a decade so can't comment on that. I don't really understand your argument though - the options approach is extremely readable. You can also do the options approach in C or C++. The amount of stuff that you can slap into one line is an interesting benchmark to use for languages.
dandersch 1 days ago [-]
It's always crazy to see languages like C being able to beat high-level languages at some ergonomics (which is usually their #1 point of pride) just because C has bitfields and they often don't.
RaftPeople 24 hours ago [-]
> The best way in many languages for flags is using unsigned integers that are botwise-ORed together.
Why is that the "best" way?
lelanthran 15 hours ago [-]
> Why is that the "best" way?
"Best way" is often contextual and subjective. In this context (boolean flags to a function), this way is short, readable and scoped, even in C which doesn't even have scoped namespaces.
Maybe there are better ways, and maybe you have a different "best way", but then someone can legitimately ask you about your "best way": `Why is that the "best" way?`
The only objective truth that one can say about a particular way to do something is "This is not the worst way".
waste_monk 21 hours ago [-]
It's simple, efficient, and saves space in memory. While not as big a deal these days where most systems have plentiful RAM, it's still useful on things like embedded devices.
Why waste a whole byte on a bool that has one bit of data, when you can pack the equivalent of eight bools into the same space as an uint8_t for free?
RaftPeople 20 hours ago [-]
Sure, that works when trying to conserve memory to the degree that a few bytes matter, but the downside is that it's more complex, less obvious.
I've done exactly what you propose on different projects but I would never call it the "best" method, merely one that conserves memory but with typical trade-offs like all solutions.
drdec 22 hours ago [-]
I'm surprised nobody has suggested this yet. Just use a different name for the function. In your example, the new function should be prettyPrint(). No booleans required. No extra structures required.
mostlysimilar 1 days ago [-]
Something I really love in Elixir is that functions can be named identically and are considered different with different arity.
I'm not sure I understand how this is different from function overloading
0x3444ac53 1 days ago [-]
I think the answer to this (specific to lua) is passing a table as an argument that gets unpacked.
Lyngbakr 1 days ago [-]
I don't know if this is where you read it, but this advice is also given in Clean Code.
OskarS 16 hours ago [-]
I don’t remember exactly where I read this, but I think it was some internet forum of some kind. It makes sense that whoever wrote it got it from there. Never read it myself.
Vinnl 1 days ago [-]
I die a little inside every time I write:
JSON.stringify(val, null, 2);
(So yes, but it goes beyond booleans. All optional parameters should be named parameters.)
nutjob2 1 days ago [-]
> I now believe very strongly that you should virtually never have a boolean as an argument to a function. There are exceptions, but not many.
Really? That sounds unjustified outside of some specific context. As a general rule I just can't see it.
I don't see whats fundamentally wrong with it. Whats the alternative? Multiple static functions with different names corresponding to the flags and code duplication, plus switch statements to select the right function?
Or maybe you're making some other point?
fifticon 1 days ago [-]
The scope of TFA is data modelling, where it advises to use more descriptive data values, such as enums or happenedAtTimestamp.
However, personally I agree with the advice, in another context: Function return types, and if-statements.
Often, some critical major situation or direction is communicated with returned booleans.
They will indicate something like 'did-optimizer-pass-succeed-or-run-to-completion-or-finish',
stuff like that.
And this will determine how the program proceeds next (retry, abort, continue, etc.)
A problem arises when multiple developers (maybe yourself, in 3 months) need to communicate about and understand this correctly.
Sometimes, that returned value will mean
'function-was-successful'.
Sometimes it means 'true if there were problems/issues'
(the way to this perspective, is when the function is 'checkForProblems'/verify/sanitycheck() ).
Another way to make confusion with this, is when multiple functions are available to plug in or proceed to call - and people assume they all agree on "true is OK, false is problems" or vice versa.
A third and maybe most important variant, is when 'the return value doesn't quite mean what you thought'.
- 'I thought it meant "a map has been allocated".'
- but it means 'a map exists' (but has not necesarily been allocated, if it was pre-existing).
All this can be attacked with two-value enums,
NO_CONVERSION_FAILED=0, YES_CONVERSION_WAS_SUCCESFUL=1 .
(and yes, I see the peril in putting 0 and 1 there, but any value will be dangerous..)
1718627440 1 days ago [-]
That's why you have coding style guides and documentation. Both choices are "correct", you just need to be consistent.
Fraterkes 1 days ago [-]
I’m not a very experienced programmer, but the first example immediately strikes me as weird. The consideration for choosing types is often to communicate intend to others (and your future self). I think that’s also why code is often broken up into functions, even if the logic does not need to be modular / repeatable: the function signature kind of “summarizes” that bit of code.
Making a boolean a datetime, just in case you ever want to use the data, is not the kind of pattern that makes your code clearer in my opinion. The fact that you only save a binary true/false value tells the person looking at the code a ton about what the program currently is meant to do.
turboponyy 1 days ago [-]
I actually completely agree with both the article and your point that your code should directly communicate your intent.
The angle I'd approach it from is this: recording whether an email is verified as a boolean is actually misguided - that is, the intent is wrong.
The actual things of interest are the email entity and the verification event. If you record both, 'is_verified' is trivial to derive.
However, consider if you now must implement the rule that "emails are verified only if a verification took place within the last 6 months." Recording verifications as events handles this trivially, whilst this doesn't work with booleans.
Some other examples - what is the rate of verifications per unit of time? How many verification emails do we have to send out?
Flipping a boolean when the first of these events occurs without storing the event itself works in special cases, but not in general. Storing a boolean is overly rigid, throws away the underlying information of interest, and overloads the model with unrelated fields (imagine storing say 7 or 8 different kinds of events linked to some model).
crazygringo 23 hours ago [-]
> that is, the intent is wrong. The actual things of interest are the email entity and the verification event.
Or, your assumption about the intent is wrong. Many (most?) times, the intent is precisely whether an email is verified. That's all. And that's OK if that's all the project needs.
> Storing a boolean is overly rigid, throws away the underlying information of interest, and overloads the model with unrelated fields
Also, storing a boolean can most accurately reflect intent, avoid hoarding unnecessary and unneeded information, and maximize the model's conceptual clarity.
bluGill 1 days ago [-]
In the case of a database you often can't fix mistakes so overdesign just in case makes sense. Many have been burned.
crazygringo 23 hours ago [-]
Probably even more have been burned by overdesign.
If you decided to make your boolean a timestamp, and now realize you need a field with 3 states, now what?
If you'd kept your boolean, you could convert the field from BOOL to TINYINT without changing any data. [0, 1] becomes [0, 1, 2] easily.
jandrewrogers 22 hours ago [-]
While I agree on the over-design point, it doesn't follow that the BOOL is trivially convertible to a TINYINT. In some databases a BOOL is stored as a single bit.
21 hours ago [-]
hahn-kev 1 days ago [-]
See always having a synthetic primary key
joshstrange 1 days ago [-]
Normally you'd name the field `created_at`, `updated_at`, or similar which I think makes it very clear.
> Making a boolean a datetime, just in case you ever want to use the data, is not the kind of pattern that makes your code clearer in my opinion.
I don't follow at all, if your field is named as when a thing happened (`_at` suffix) then that seems very clear. Also, even if you never expose this via UI it can be a godsend for debugging "Oh, it was updated on XXXX-XX-XX, that's when we had Y bug or that's why Z service was having an issue".
burnt-resistor 18 hours ago [-]
The author doesn't consider the nuanced trade-offs of where, what kind, and how much data to persist, compute, and where to store it, whether in the database or modeled in code. It's a quite superficial article bordering on meaninglessness that doesn't expound on the considerations of thoughtful engineering for the stakeholders: swe maintenance, operations, and business/user needs. It should lead into asking questions rather than present "the" answer.
bsoles 1 days ago [-]
This is such a weird advice and it seems to come from a particular experience of software development.
How about using Booleans for binary things? Is the LED on or off, is the button pressed or not, is the microcontroller pin low or high? Using Enums, etc. to represent those values in the embedded world would be a monumental waste of memory, where a single bit would normally suffice.
aDyslecticCrow 1 days ago [-]
The boolean type is the massive whaste, not the enum. A boolean in c is just a full int. So definitely not a whaste to use an enum which is also an int.
And usually you use operations to isolate the bit from a status byte or word, which is how it's also stored and accessed in registers anyway.
So its still no boolean type despite expressing boolean things.
Enums also help keep the state machine clear. {Init, on, off, error} capture a larger part of the program behavior in a clear format than 2-3 binary flags, despite describing the same function. Every new boolean flag is a two state composite state machine hiding edgecases.
glxxyz 22 hours ago [-]
Not necessarily a waste in all languages. A c++ `std::vector<bool>` efficiently packs bits for example, although it does have its own 'issues'.
aDyslecticCrow 17 hours ago [-]
I kinda hate that. It gives the vector very special behaviour for one type in particular, going against the intuition behind how both boolean and vector works everywhere else in the language.
Id prefer if they just added std::bitvector.
bigger_cheese 24 hours ago [-]
I work at an industrial plant we use boolean datatypes for stateful things like this. For example is Conveyor belt running (1) or stopped (0).
Sure we could store the data by logging the start timestamp and a stop timestamp but our data is stored on a time series basis (i.e. in a Timeseries DB, the timestamp is already the primary key for each record) When you are viewing the trend (such a on control room screen) you get a nice square-wave type effect you can easily see when the state changes.
This also makes things like total run time easy to compute, just sum the flag value over 1 second increments to get number of seconds in a shift the conveyor was running for.
Sure in my example you could just store something like motor current in Amps (and we do) and use this to infer the conveyor state but hopefully I've illustrated why a on/off flag is cleaner.
devnullbrain 1 days ago [-]
Having spent time in the embedded mines, I think the onus is on embedded to vocally differentiate itself from normal software development, not for it to be assumed that general software advice applies to embedded.
If embedded projects start using C standards from the past quarter century, they can join in on type discourse.
jilles 1 days ago [-]
* led status: on, off, non-responsive
* button status: idle, pressing, pressed
I'm with you by the way, but you can often think of a way to use enums instead (not saying you should).
edit: The 24th of October will be the 20th anniversary of that post.
nh23423fefe 1 days ago [-]
well yes. every boolean is iso to 2, and every 2 can be embedded in 3. and every N can be embedded in N+1
leni536 1 days ago [-]
> Using Enums, etc. to represent those values in the embedded world would be a monumental waste of memory, where a single bit would normally suffice.
In C++ you can use enums in bit-fields, not sure what the case is in C.
padjo 1 days ago [-]
I think it’s implicitly in the context of datastore design. In that context it feels like decent advice that would prevent a lot of mess.
kps 1 days ago [-]
They're boolean (single bit of information) but not boolean (single bit interpreted as meaning true or false). The LED isn't true or false, the microcontroller pin isn't true or false.
bsoles 1 days ago [-]
This is semantic pedantry. The association true/1/high and false/0/low is well-known and understood.
kps 1 days ago [-]
Plenty of signals are asserted (true) by being brought low, or have 1=low (e.g. CAN).
marcellus23 1 days ago [-]
huh? The LED isn't true or false, but whether the LED is on is true or false.
simondw 1 days ago [-]
And whether the LED is off is false or true.
bayindirh 1 days ago [-]
I'll expand on the first example, the datetime one.
Many user databases use soft-deletes where fields can change or be deleted, so user's actions can be logged, investigated or rolled back.
When user changes their e-mail (or adds another one), we add a row, and "verifiedAt" is now null. User verifies new email, so its time is recorded to the "verifiedAt" field.
Now, we have many e-mails for the same user with valid "verifiedAt" fields. Which one is the current one? We need another boolean for that (isCurrent). Selecting the last one doesn't make sense all the time, because we might have primary and backup mails, and the oldest one might be the primary one.
If we want to support multiple valid e-mails for a single account, we might need another boolean field "isPrimary". So it makes two additional booleans. isCurrent, isPrimary.
I can merge it into a nice bit field or a comma separated value list, but it defeats the purpose and wanders into code-golf territory.
Booleans are nice. Love them, and don't kick them around because they're small, and sometimes round.
kelnos 1 days ago [-]
I would say for your specific example, you shouldn't have boolean flags for that in the user_emails table, but instead have a primary_email column in the users table, that has a foreign key reference to the user_emails table. That way you can also ensure that the user always has exactly one primary email.
And for is_current, I still think a nullable timestamp could be useful there instead of a boolean. You might have a policy to delete old email addresses after they've been inactive for a certain amount of time, for example. But I'll admit that a boolean is fine there too, if you really don't care when the user removed an email from the current list. (Depending on usage patterns, you might even want to move inactive email addresses to a different table, if you expect them to accumulate over time.)
I think booleans are special in a weird way: if you think more about what you're using it for, you can almost always find a different way to store it that gives you richer information, if you need it.
bayindirh 15 hours ago [-]
> you can almost always find a different way to store it that gives you richer information, if you need it.
The trade-off here is DB speed/size and the secondary information you can gather from that DB.
In my eyes, after a certain point the DB shall not be the place to query and rebuild past actions from scattered data inside it. Instead, you can delegate these things to a well-formatted action log, so you can just query that one.
Unless it's absolutely necessary, tiering sounds and feels much more appropriate than bloating a DB.
taylodl 1 days ago [-]
What I'm getting out of this is boolean shouldn't be a state that's durably stored, it's ephemeral, an artifact of runtime processing. You wouldn't likely durably store a boolean in an OLTP store, but your ETL into the OLAP store may capture a boolean to simplify logic for all the systems using the OLAP store to drive decision support. That is, it's an optimization. That feels right, but I've never really thought through this before. Interesting!
jbreckmckye 1 days ago [-]
This makes intuitive sense because booleans are obviously reductive, as reductive as it gets (ideally stored in 1 bit), but for processing and analysis there's typically no reason to store data so sparingly
taylodl 1 days ago [-]
For processing and analysis, you're centralizing the compute of complex analysis and storing the result so downstream decision support systems can use the result as a criterion in their analysis - and not have to distribute, and maintain, that logic throughout the set of applications. A contrived example: is_valued_customer. This is a simple boolean, but its computation can be involved and you wouldn't want to have to replicate and maintain this logic throughout all the applications. But at the time, it likely has no business being in the OLTP store.
jbreckmckye 1 days ago [-]
You might persist that value as an optimisation, but if you make it your source of truth, and discard your inputs, you better make sure you never ever ever ever have a bug in deriveValuedCustomer() or else you have lost data permanently
taylodl 1 days ago [-]
Good point - you wouldn't want to discard your inputs. You're going to need them should you ever redefine deriveValuedCustomer() - which is likely for a system that will be in production for 10-20 years or more.
RaftPeople 24 hours ago [-]
> You wouldn't likely durably store a boolean in an OLTP store
Parcel carrier shipment transaction:
ReturnServiceRequested: True/False
I can think of many more of these that are options of some transaction that should be stored and naturally are represented as boolean.
xg15 5 hours ago [-]
The weirdest boolean for me is the "is secure" flag of the web platform, that is true if the site is loaded via https with a valid cert or is served from localhost - and that decides if any advanced browser features are available or not.
It makes sense given the history of the web, but its semantics are pretty much the same as (old) Twitter's blue checkmark.
Both essentially say "the entity you're interacting with is really the one you believe they are" - but neither makes any attempt to actually find out what my belief is, nor do they give me any information to verify that belief myself. It's just a "trust me" in form of a boolean.
skybrian 23 hours ago [-]
Changing a boolean database field like 'is_confirmed' to a nullable datetime is a simple, cheap hack that records a little bit of information about an event. It's appropriate when you're not sure you care about the event.
If you know you actually care about the event, there are probably more fields to stuff into an event record, and then maybe you could save the event record's id instead?
But going too far in this direction based on speculation about what information you might need later is going to complicate the schema.
Please if you are in this situation do not take this advice. You just generate massive garbage abstractions upstream. If boolean arguments are out of hand, the problem isn't the boolean.
nurettin 1 days ago [-]
Isn't that the point? If booleans are out of hand, either you are trying to emulate a state machine or you are lacking enums. Or in case of 20 bool parameters, just make it a struct. Nobody will complain.
astrange 1 days ago [-]
Everyone's always trying to emulate a state machine - OOP objects are kind of just an unsafe informal state machine implementation.
Oddly, almost noone has tried providing actual state machines where you have to prove you've figured out what the state transitions are.
seveibar 9 hours ago [-]
Strongly disagree with the article. Enums make gradual migrations impossible in both databases and API design with third party consumers. Even in the example they gave where they recommended a user “role” instead of an is_admin boolean they are creating huge problems. Am I supposed to tell all my downstream API consumers that they need to refactor their code because we’re introducing a new value to “role” that covers “billing_manager”? There are literally zero downstream issues with adding is_billing_manager, and you get the additional representation benefit that the booleans can _both be true_, so I am not trapped into an exclusive role paradigm.
eviks 9 hours ago [-]
> tell all my downstream API consumers that they need to refactor their code because we’re introducing a new value to “role” that covers “billing_manager”?
With non-exhaustive enums you don't need to do that? But also, you can match on a single variant ignoring others for the admin role check?
APL and its descendents don't have booleans, just 0 and 1 [0]. Which is awesome. It allows for bitmasks, sums / reductions, and even conditionals via Iverson Brackets. [1]
To summarise: booleans should be derived, not stored
mrheosuper 1 days ago [-]
I dont like this pattern.
The author example, checking if "Datetime is null" to check if user is authorized or not, is not clear.
What if there are other field associated with login session like login Location ? Now you dont know exactly what field to check.
Or if you receive Null in Datetime field, is it because the user has not login, or because there is problem when retriving Datetime ?
This is just micro-optimization for no good reason
monkeyelite 1 days ago [-]
> Now you dont know exactly what field to check.
Yes you do - you have a helper method that encapsulates the details.
In the DB you could also make a view or generated column.
> This is just micro-optimization for no good reason
It’s conceptually simpler to have a representation with fewer states, and bugs are hopefully impossible. For example what would it mean for the bool authorized to be false but the authorized date time to be non-null?
RaftPeople 24 hours ago [-]
> In the DB you could also make a view or generated column.
Or you could just use a boolean with a natural self describing name.
monkeyelite 22 hours ago [-]
My proposal is to use a null date.
Did you miss the part about contradictory states? Are you going to add some database constraints to your book instead?
RaftPeople 20 hours ago [-]
If you need to store a value that has two states, use a boolean, don't overcomplicate it unless there is real value in creating the complication (which there is value, sometimes).
Regarding contradictory states:
Given that just about no DB is in 5th normal form, the possibility of contradictory states exist in almost every RDBMS, regardless of booleans. It seems like an argument that doesn't really have any strength to it.
monkeyelite 10 hours ago [-]
> If you need to store a value that has two states, use a boolean
Please refer to the article for context of this discussion.
Because the databases you have worked in are bad means we should not teach or advocate for correct data structure design?
RaftPeople 7 hours ago [-]
> Because the databases you have worked in are bad
You understand my point about 5th normal form, right? 99.99999999% of all databases are in approx (averaged across all tables) somewhere between 2.5 and 3rd normal form.
This means there are many examples in most databases of data that is implied by some other combination of data elements.
Do you store the calculated discounts and net selling price for line items after running the order through the promotion engine? Most system do store things like that because people need to consume that data and it's either impractical or not possible for each consumer to run the data through the promotion engine every time they need to use that data.
Same thing goes for the total invoice qty, amount, taxes, etc.
This is one small example of the type of data dependencies that exist throughout most complex systems.
amelius 1 days ago [-]
> A lot of boolean data is representing a temporal event having happened. For example, websites often have you confirm your email. This may be stored as a boolean column, is_confirmed, in the database. It makes a lot of sense.
> But, you're throwing away data: when the confirmation happened. You can instead store when the user confirmed their email in a nullable column. You can still get the same information by checking whether the column is null. But you also get richer data for other purposes.
So the Boolean should be something else + NULL?
Now we have another problem ...
afc 1 days ago [-]
It should be: std::optional<Timestamp> (or Optional[datetime] or equivalent in others languages)
If you're using a type system that is so poor that it won't easily detect statically places where you're not correctly handling the absent values, you do have a much bigger problem than using bool.
buckle8017 1 days ago [-]
It should be a timestamp of the last time the email was verified.
It's a surprisingly useful piece of data to have.
amelius 1 days ago [-]
Even more useful is a log of all the changes in the database. This gives you what you want, and it would be automatic for any data you store.
So, keep the Boolean, and use a log.
aydyn 1 days ago [-]
No? So you have to look at database history to extract information you think is useful?
That's a terrible database design.
amelius 1 days ago [-]
It's the basis behind Datomic, if I'm not mistaking.
You can easily search through history. The point is, it is better to do this in the design of the database than in the design of the schema.
So: "No?" -> "Yes!"
aydyn 1 days ago [-]
Okay, but for something like SQL this seems like a bad idea.
ck45 1 days ago [-]
One argument that I’m missing in the article is that with an enumerated, states are mutually exclusive, while withseveral booleans, there could be some limbo state of several bool columns with value true, e.g. is_guest and is_admin, which is an invalid state.
cjs_ac 1 days ago [-]
In that case, you set the enumeration up to use separate bit flags for each boolean, e.g., is_guest is the least significant bit, is_admin is the second least significant bit, etc. Of course, then you've still got a bunch of booleans that you need to test individually, but at least they're in the same column.
cratermoon 1 days ago [-]
look up the typestate pattern.
coin 1 days ago [-]
> But, you're throwing away data
Often it’s intentional for privacy. Record no more data than what’s needed.
chikinpotpi 1 days ago [-]
I generally prefer to let one value mean one thing.
Allowing the presence of a dateTime (UserVerificationDate for example) to have a meaning in addition to its raw value seems safe and clean. But over time in any system these double meanings pile up and lose their context.
Having two fields (i.e. UserHasVerified, UserVerificationDate) doesn't waste THAT much more space, and leaves no room for interpretation.
jerf 1 days ago [-]
But it does leave room for "UserHasVerified = false, UserVerificationDate = 2025/08/25" and "UserHasVerified = true, UserVerificationDate = NULL".
The better databases can be given a key to force the two fields to match. Most programming languages can be written in such a way that there's no way to separate the two fields and represent the broken states I show above.
However the end result of doing that ends up isomorphic to simply having the UserVerificationDate also indicate verification. You just spent more effort to get there. You were probably better off with a comment indicating that "NULL" means not verified.
In a perfect world I would say it's obvious that NULL means not verified. In the real world I live in I encounter random NULLs that do not have a clear intentionality behind them in my databases all the time. Still, some comments about this (or other documentation) would do the trick, and the system should still tend to evolve towards this field being used correctly once it gets wired in to the first couple of uses.
1 days ago [-]
cratermoon 1 days ago [-]
> Having two fields (i.e. UserHasVerified, UserVerificationDate)
What happens when they get out of sync?
Smaug123 14 hours ago [-]
My preferred framing of this is: "You told me a condition was `true`, presumably because you had evidence for it. Why are you telling me you once had evidence? Just give me the evidence!".
Don't tell me that an event happened; give me the event that happened. (Sure, project it down to something if you like for efficiency; throwing away most of the information is what gives the timestamps example.)
Don't tell me simply whether a user is an admin; tell me what the user is. (That's the enums example.)
Logically speaking, `bool` is equivalent to `Optional<unit>`, and in fact that's frequently what it's used for. Phrased that way, it's much more obvious that this representation doesn't match all that many domains very well; it's clearly useful for performance (because throwing away unnecessary data is a standard performance technique), but it's also clearly a candidate premature optimisation.
alphazard 1 days ago [-]
The timestamps instead of boolean thing is something good engineers stumble upon pretty reliably. One gotcha is the database might be weird about indexing nulls. I'm not going to give an example because you should really read the docs for your specific database if this matters.
The ever growing set of boolean flags seems to be an attractor state for database schemas. Unless you take steps to avoid/prohibit it, people will reach for a single boolean flag for their project/task. Fortunately it's pretty easy to explain why it's bad with a counting argument. e.g. There are this many states with booleans, and this fraction are valid vs. this many with the enum and this fraction are valid. There is no verification, so a misunderstanding is more likely to produce an invalid state than a valid state.
pixelfarmer 1 days ago [-]
There can be verification for such things.
the__alchemist 1 days ago [-]
I read an article with the same premise here a few years ago.
A Boolean is a special, universal case of an enum (or whatever you prefer to call these choice types...) that is semantically valid for many uses.
I'm also an enum fanboy, and agree with the article's examples. It's conclusion of not using booleans because enums are more appropriate in some cases is wrong.
Some cases are good uses of booleans. If you find a Boolean isn't semantically clear, or you need a third variant, then move to an enum.
zavec 1 days ago [-]
Oh this is fantastic! I'm giving a talk in about a month at work on how to use the python type system in useful ways to catch more bugs before runtime, and this seems like a great point to throw in there as an aside at the very least!
arethuza 1 days ago [-]
I once, briefly, worked with a developer who believed that you should never use primitive types for fields or parameters...
zwieback 1 days ago [-]
Maybe for the DB domain author is talking about but the nice thing about a bool is that it's true or false. I don't have to dig around documentation or look through the code what the convention of converting enum, datetime, etc. to true/false is. 1970/1/1 (I was four years old then, just sayin), -6000 or something else?
Nullable helps a lot here but not all languages support that the same way.
jmyeet 1 days ago [-]
My favorite Java code I've ever seen is:
@Nullable Optional<Boolean> foo;
For when 3 values for a boolean just aren't enough.
Here are two rules I learned from data modelling and APIs many years ago:
1. If you don't do arithmetic on it, it's not a number. ID int columns and foreign keys are excluded from this. But a phone number or an SSN or a employee ID (that is visible to people) should never be a number; and
2. It's almost never a boolean. It's almost always an enum.
Enums are just better. You can't accidentally pass a strong enum into the wrong parameter. Enums can be extended. There's nothing more depressing than seeing:
This goes for returning success from a function too.
kelnos 1 days ago [-]
> @Nullable Optional<Boolean> foo;
To be (somewhat facetiously) fair, that's just JSON. The key can be not-present, present but null, or it can have a value. I usually use nested Options for that, not nulls, but it's still annoying to represent.
In Rust I could also do
enum JsonValue<T> {
Missing,
Null,
Present(T),
}
But then I'd end up reinventing Option semantics, and would need to do a bunch of conversions when interacting with other stuff.
eflim 1 days ago [-]
I would add counters to this list. Start from zero (false), and then you know not just whether an event has occurred, but how many times.
throwaway81523 23 hours ago [-]
Aka "Boolean blindness", look it up.
fenesiistvan 1 days ago [-]
I was hoping to read about bitfields or bit flags.
usernamed7 1 days ago [-]
replace "should" with "could".
I do think its wise to consider when a boolean could be inferred from some other mechanism, but i also use booleans a lot because they are the best solution for many problems. Sure, sometimes what is now a boolean may need to become something later like an enum, and that's fine too. But I would not suggest jumping to those out the gate.
Booleans are good toggles and representatives of 2 states like on/off, public/private. But sometimes an association, or datetime, or field presence can give you more data and said data is more useful to know than a separate attribute.
burnt-resistor 18 hours ago [-]
The only, universally-valid advice is: it depends. There are no hard and fast universal rules except carefully deciding when and when not to break conventions and guidelines.
Depending on the complexity of and user requirements the system, hard-coding roles as an enum could span the spectrum anywhere from a good to a bad idea. It would be a terrible thing if user-define roles were a requirement because an enum can't model a dynamic set of ad-hoc, user-defined groups. The careful and defensive planning for evolution of requirements without over-optimizing, over-engineering, or adding too much extra code is part of the balance that must be made. It could be a good thing if it were a very simple site that just needed to ship ASAP.
_dain_ 1 days ago [-]
Booleans don't "remember" what they mean. They're just a `true` or a `false`, the association with the `is_authenticated` variable or whatever has to be maintained by programmer discipline. But when you have an enum variant like `Authenticated`, that's encoded in the value itself, helped by the type system. It can't be confused with some other state or condition.
Booleans beget more booleans. Once you have one or two argument flags, they tend to proliferate, as programmers try to cram more and more modalities into the same function signature. The set of possible inputs grows with 2^N, but usually not all of them are valid combinations. This is a source of bugs. Again, enums / sum-types solve this because you can make the cardinality of the input space precisely equal to the number of valid inputs.
Turning boolean database values into timestamps is a weird hack that wastes space. Why do you want to record when an email was verified, but not when any other fields that happen to be strings or numbers or blobs were changed? Either implement proper event logging or not, but don't do some weird hack where only booleans get fake-logged but nothing else does.
Should booleans turn into enums when a third mutually-exclusive state gets added? Yes, of course, so go refactor, easy. But don't start with an enum before you need it. The same way we don't start with floats rather than ints "just in case" we need fractional values later on.
Booleans are a cornerstone of programming and logic. They're great. I don't know where this "booleans are bad" idea came from, but it's the opposite of communicating intention clearly in code. That boolean should probably stay a boolean unless there's an actual reason to change it.
I think a lot of people misunderstand KISS, believing everything should be primitives or surface-level simplicity. Instead, I interpret "simple" not something like golang's surface-level readability, but infosec's "principle of least privilege". Pick the option that minimizes possible state and capture the requirement logic, rather than primitives just because they're "simple" or "familiar".
Even then, sometimes it's fine to violate it. In this case, (nullable) date time might be more preferable than boolean for future-proofing purposes. It's trivial to optimize space by mapping date time to boolean, while it's a total pain to migrate from boolean to date time.
Also, doesn't "... a weird hack that wastes space" contradict "Avoid premature optimization"?
Which is what booleans do when the requirement is two states.
> Also, doesn't "... a weird hack that wastes space" contradict "Avoid premature optimization"?
No, because including the timestamp "just in case" is the premature optimization.
> wastes space > premature optimisation
A timestamp is a witness of when the email was verified. Since if they’ve verified can be calculated from it, having both is not only redundant but allow invalid states to be represented.
Cases like email verified are often followed by the need to know when. Say an expiry system. Going from bools, you are faced with the hard choice of how to migrate existing state.
Databases also warrant more care as a source of persistent state - accessed by multiple versions of your software. If you don’t have this persistency, then it matters less.
> any other fields that happen to be strings or numbers or blobs were changed > implement proper event logging
Event logging is orthogonal to your database state. If your business logic needs dirty flags or timestamps they should be stored in the database, not queried.
And if you do need it for other fields, adding the bool is the perfect time to ask yourself if what you need is a timestamp.
> way we don't start with floats rather than ints "just in case" we need fractional values later on
Floats are a subset of int, and a natural migration. A bool can be calculated from a timestamp, but not the other way.
And often not. That's the point. Avoid premature optimization. (FWIW, I've never encountered a system in my life where a successful email verification then expired after a period of time.)
> having both is not only redundant but allow invalid states to be represented.
That's a different topic. That's about what to do when you know you need the timestamp. The article is about using a timestamp when you don't have a timestamp requirement.
> they should be stored in the database, not queried.
I don't know what that means. Everything in the database is queried. And you can store your events in the database, in one or more event log tables.
> Floats are a subset of int, and a natural migration.
I think you meant to say the opposite, but even that's not true because of precision. And so too are enums a natural migration from booleans. That's the point -- start simple and extend as needed.
KISS, YAGNI, and then actually analyze your requirements to understand what the mature schema looks like. A boolean is the simplest thing that can possibly work, usually. Do that first and see how your requirements evolve, then build the database schema that reflects your actual requirements.
Although it always depends on what exactly you're really doing.
This is not premature optimization, sometimes booleans can be extremely hard to change so it's not as easy as "just refactor".
Basically, what had happened was that the developer had written a function ("serialize()", in this example) and then later discovered that they wanted slightly different behaviour in some cases (maybe pretty printed or something). Since Lua allows you to change arity of a function without changing call-sites (missing arguments are just nil), they had just added a flag as an argument. And then another flag. And then another.
I now believe very strongly that you should virtually never have a boolean as an argument to a function. There are exceptions, but not many.
And you can get the same problem with any argument type. What do the arguments in
mean?In general, you're going to need some kind of way to communicate the purpose - named parameters, IDE autocomplete, whatever - and once you have that then booleans are not worse than any other type.
Basically, if you have a function takes a boolean in your API, just have two functions instead with descriptive names.
Yeah right like I’m going to expand this function that takes 10 booleans into 1024 functions. I’m sticking with it. /s
Tons of well-written functions have many more potential code paths than that. And they're easy to reason about because the parameters don't interact much.
Just think of plotting libraries with a ton of optional parameters for showing/hiding axes, ticks, labels, gridlines, legend, etc.
If this was my function I would probably make the parameters atrributes of an TurboEncabulator class and add some setter methods that can be chained, e.g. Rust-style:
I absolutely agree named arguments are the way to go. But my comment wasn't in the thread about that.
https://en.wikipedia.org/wiki/Turbo_encabulator
You could of course store the boolean in a variable and have the variable name speak for its meaning but at that point might as well just use an enum and do it proper.
For things like strings you either have a variable name - ideally a well describing one - or a string literal which still contains much more information than simply a true or false.
I personally believe very strongly that people shouldn’t use programming languages lacking basic functionalities.
I believe IDE's had the feature of showing me the function header with a mouse hover 20+ years ago.
It's honestly what I miss the most about python: keyword args, keyword-only args, positional-only args.
What I don't miss is the unholy abomination of *kwargs...
Enums are better because you can carve out precisely the state space you want and no more.
serialize(someObject, prettyPrint:true)
NB I have no idea whether Lua has keyword arguments but if your language does then that would seem to address your particular issue?
If the tiny fraction is small enough, just write different functions for it ("serialize()" and "prettyPrint()"). If it's not feasible to do it, have a good long think about the API design and if you can refactor it nicely. If the number of combinations is enormous, something like the "builder pattern" is probably a good idea.
It's a hard problem to solve, because there's all sorts of programming principles in tension here ("don't repeat yourself", "make illegal states unrepresentable", "feature flags are bad") and in your way of solving a practical problem. It's interesting to study how popular libraries do this. libcurl is a good example, which has a GAZILLION options for how to do a request, and you do it "statefully" by setting options [1]. libcairo for drawing vector graphics is another interesting example, where you really do have a combinatorial explosion of different shapes, strokes, caps, paths and fills [2]. They also do it statefully.
[1]: https://curl.se/libcurl/c/curl_easy_setopt.html
[2]: https://cairographics.org/manual/cairo-cairo-t.html
The best way in many languages for flags is using unsigned integers that are botwise-ORed together.
In pseudocode:
Whatever language you are using, it probably has some namespaced way to define flags as `(1 << 0)` and `(1 << 1)` etc.options = new SerializeOptions();
options.PrettyPrint = true;
options.Flag2 = "red"
options.Flag3 = 27;
serialize(someObject, options)
Why is that the "best" way?
"Best way" is often contextual and subjective. In this context (boolean flags to a function), this way is short, readable and scoped, even in C which doesn't even have scoped namespaces.
Maybe there are better ways, and maybe you have a different "best way", but then someone can legitimately ask you about your "best way": `Why is that the "best" way?`
The only objective truth that one can say about a particular way to do something is "This is not the worst way".
Why waste a whole byte on a bool that has one bit of data, when you can pack the equivalent of eight bools into the same space as an uint8_t for free?
I've done exactly what you propose on different projects but I would never call it the "best" method, merely one that conserves memory but with typical trade-offs like all solutions.
https://elixirschool.com/en/lessons/basics/functions#functio...
Really? That sounds unjustified outside of some specific context. As a general rule I just can't see it.
I don't see whats fundamentally wrong with it. Whats the alternative? Multiple static functions with different names corresponding to the flags and code duplication, plus switch statements to select the right function?
Or maybe you're making some other point?
However, personally I agree with the advice, in another context: Function return types, and if-statements.
Often, some critical major situation or direction is communicated with returned booleans. They will indicate something like 'did-optimizer-pass-succeed-or-run-to-completion-or-finish', stuff like that. And this will determine how the program proceeds next (retry, abort, continue, etc.)
A problem arises when multiple developers (maybe yourself, in 3 months) need to communicate about and understand this correctly.
Sometimes, that returned value will mean 'function-was-successful'. Sometimes it means 'true if there were problems/issues' (the way to this perspective, is when the function is 'checkForProblems'/verify/sanitycheck() ).
Another way to make confusion with this, is when multiple functions are available to plug in or proceed to call - and people assume they all agree on "true is OK, false is problems" or vice versa.
A third and maybe most important variant, is when 'the return value doesn't quite mean what you thought'. - 'I thought it meant "a map has been allocated".' - but it means 'a map exists' (but has not necesarily been allocated, if it was pre-existing).
All this can be attacked with two-value enums, NO_CONVERSION_FAILED=0, YES_CONVERSION_WAS_SUCCESFUL=1 . (and yes, I see the peril in putting 0 and 1 there, but any value will be dangerous..)
Making a boolean a datetime, just in case you ever want to use the data, is not the kind of pattern that makes your code clearer in my opinion. The fact that you only save a binary true/false value tells the person looking at the code a ton about what the program currently is meant to do.
The angle I'd approach it from is this: recording whether an email is verified as a boolean is actually misguided - that is, the intent is wrong.
The actual things of interest are the email entity and the verification event. If you record both, 'is_verified' is trivial to derive.
However, consider if you now must implement the rule that "emails are verified only if a verification took place within the last 6 months." Recording verifications as events handles this trivially, whilst this doesn't work with booleans.
Some other examples - what is the rate of verifications per unit of time? How many verification emails do we have to send out?
Flipping a boolean when the first of these events occurs without storing the event itself works in special cases, but not in general. Storing a boolean is overly rigid, throws away the underlying information of interest, and overloads the model with unrelated fields (imagine storing say 7 or 8 different kinds of events linked to some model).
Or, your assumption about the intent is wrong. Many (most?) times, the intent is precisely whether an email is verified. That's all. And that's OK if that's all the project needs.
> Storing a boolean is overly rigid, throws away the underlying information of interest, and overloads the model with unrelated fields
Also, storing a boolean can most accurately reflect intent, avoid hoarding unnecessary and unneeded information, and maximize the model's conceptual clarity.
If you decided to make your boolean a timestamp, and now realize you need a field with 3 states, now what?
If you'd kept your boolean, you could convert the field from BOOL to TINYINT without changing any data. [0, 1] becomes [0, 1, 2] easily.
> Making a boolean a datetime, just in case you ever want to use the data, is not the kind of pattern that makes your code clearer in my opinion.
I don't follow at all, if your field is named as when a thing happened (`_at` suffix) then that seems very clear. Also, even if you never expose this via UI it can be a godsend for debugging "Oh, it was updated on XXXX-XX-XX, that's when we had Y bug or that's why Z service was having an issue".
How about using Booleans for binary things? Is the LED on or off, is the button pressed or not, is the microcontroller pin low or high? Using Enums, etc. to represent those values in the embedded world would be a monumental waste of memory, where a single bit would normally suffice.
And usually you use operations to isolate the bit from a status byte or word, which is how it's also stored and accessed in registers anyway.
So its still no boolean type despite expressing boolean things.
Enums also help keep the state machine clear. {Init, on, off, error} capture a larger part of the program behavior in a clear format than 2-3 binary flags, despite describing the same function. Every new boolean flag is a two state composite state machine hiding edgecases.
Id prefer if they just added std::bitvector.
Sure we could store the data by logging the start timestamp and a stop timestamp but our data is stored on a time series basis (i.e. in a Timeseries DB, the timestamp is already the primary key for each record) When you are viewing the trend (such a on control room screen) you get a nice square-wave type effect you can easily see when the state changes.
This also makes things like total run time easy to compute, just sum the flag value over 1 second increments to get number of seconds in a shift the conveyor was running for.
Sure in my example you could just store something like motor current in Amps (and we do) and use this to infer the conveyor state but hopefully I've illustrated why a on/off flag is cleaner.
If embedded projects start using C standards from the past quarter century, they can join in on type discourse.
I'm with you by the way, but you can often think of a way to use enums instead (not saying you should).
edit: The 24th of October will be the 20th anniversary of that post.
In C++ you can use enums in bit-fields, not sure what the case is in C.
Many user databases use soft-deletes where fields can change or be deleted, so user's actions can be logged, investigated or rolled back.
When user changes their e-mail (or adds another one), we add a row, and "verifiedAt" is now null. User verifies new email, so its time is recorded to the "verifiedAt" field.
Now, we have many e-mails for the same user with valid "verifiedAt" fields. Which one is the current one? We need another boolean for that (isCurrent). Selecting the last one doesn't make sense all the time, because we might have primary and backup mails, and the oldest one might be the primary one.
If we want to support multiple valid e-mails for a single account, we might need another boolean field "isPrimary". So it makes two additional booleans. isCurrent, isPrimary.
I can merge it into a nice bit field or a comma separated value list, but it defeats the purpose and wanders into code-golf territory.
Booleans are nice. Love them, and don't kick them around because they're small, and sometimes round.
And for is_current, I still think a nullable timestamp could be useful there instead of a boolean. You might have a policy to delete old email addresses after they've been inactive for a certain amount of time, for example. But I'll admit that a boolean is fine there too, if you really don't care when the user removed an email from the current list. (Depending on usage patterns, you might even want to move inactive email addresses to a different table, if you expect them to accumulate over time.)
I think booleans are special in a weird way: if you think more about what you're using it for, you can almost always find a different way to store it that gives you richer information, if you need it.
The trade-off here is DB speed/size and the secondary information you can gather from that DB.
In my eyes, after a certain point the DB shall not be the place to query and rebuild past actions from scattered data inside it. Instead, you can delegate these things to a well-formatted action log, so you can just query that one.
Unless it's absolutely necessary, tiering sounds and feels much more appropriate than bloating a DB.
Parcel carrier shipment transaction:
ReturnServiceRequested: True/False
I can think of many more of these that are options of some transaction that should be stored and naturally are represented as boolean.
It makes sense given the history of the web, but its semantics are pretty much the same as (old) Twitter's blue checkmark.
Both essentially say "the entity you're interacting with is really the one you believe they are" - but neither makes any attempt to actually find out what my belief is, nor do they give me any information to verify that belief myself. It's just a "trust me" in form of a boolean.
If you know you actually care about the event, there are probably more fields to stuff into an event record, and then maybe you could save the event record's id instead?
But going too far in this direction based on speculation about what information you might need later is going to complicate the schema.
See Paul's comment in the other thread for more: https://news.ycombinator.com/item?id=44423995
Oddly, almost noone has tried providing actual state machines where you have to prove you've figured out what the state transitions are.
With non-exhaustive enums you don't need to do that? But also, you can match on a single variant ignoring others for the admin role check?
That boolean should probably be something else - https://news.ycombinator.com/item?id=44423995 - June 2025 (1 comment, but it's solid)
[0] https://aplwiki.com/wiki/Boolean
[1] https://en.m.wikipedia.org/wiki/Iverson_bracket
The author example, checking if "Datetime is null" to check if user is authorized or not, is not clear.
What if there are other field associated with login session like login Location ? Now you dont know exactly what field to check.
Or if you receive Null in Datetime field, is it because the user has not login, or because there is problem when retriving Datetime ?
This is just micro-optimization for no good reason
Yes you do - you have a helper method that encapsulates the details.
In the DB you could also make a view or generated column.
> This is just micro-optimization for no good reason
It’s conceptually simpler to have a representation with fewer states, and bugs are hopefully impossible. For example what would it mean for the bool authorized to be false but the authorized date time to be non-null?
Or you could just use a boolean with a natural self describing name.
Did you miss the part about contradictory states? Are you going to add some database constraints to your book instead?
Regarding contradictory states:
Given that just about no DB is in 5th normal form, the possibility of contradictory states exist in almost every RDBMS, regardless of booleans. It seems like an argument that doesn't really have any strength to it.
Please refer to the article for context of this discussion.
Because the databases you have worked in are bad means we should not teach or advocate for correct data structure design?
You understand my point about 5th normal form, right? 99.99999999% of all databases are in approx (averaged across all tables) somewhere between 2.5 and 3rd normal form.
This means there are many examples in most databases of data that is implied by some other combination of data elements.
Do you store the calculated discounts and net selling price for line items after running the order through the promotion engine? Most system do store things like that because people need to consume that data and it's either impractical or not possible for each consumer to run the data through the promotion engine every time they need to use that data.
Same thing goes for the total invoice qty, amount, taxes, etc.
This is one small example of the type of data dependencies that exist throughout most complex systems.
> But, you're throwing away data: when the confirmation happened. You can instead store when the user confirmed their email in a nullable column. You can still get the same information by checking whether the column is null. But you also get richer data for other purposes.
So the Boolean should be something else + NULL?
Now we have another problem ...
If you're using a type system that is so poor that it won't easily detect statically places where you're not correctly handling the absent values, you do have a much bigger problem than using bool.
It's a surprisingly useful piece of data to have.
So, keep the Boolean, and use a log.
That's a terrible database design.
You can easily search through history. The point is, it is better to do this in the design of the database than in the design of the schema.
So: "No?" -> "Yes!"
Often it’s intentional for privacy. Record no more data than what’s needed.
Allowing the presence of a dateTime (UserVerificationDate for example) to have a meaning in addition to its raw value seems safe and clean. But over time in any system these double meanings pile up and lose their context.
Having two fields (i.e. UserHasVerified, UserVerificationDate) doesn't waste THAT much more space, and leaves no room for interpretation.
The better databases can be given a key to force the two fields to match. Most programming languages can be written in such a way that there's no way to separate the two fields and represent the broken states I show above.
However the end result of doing that ends up isomorphic to simply having the UserVerificationDate also indicate verification. You just spent more effort to get there. You were probably better off with a comment indicating that "NULL" means not verified.
In a perfect world I would say it's obvious that NULL means not verified. In the real world I live in I encounter random NULLs that do not have a clear intentionality behind them in my databases all the time. Still, some comments about this (or other documentation) would do the trick, and the system should still tend to evolve towards this field being used correctly once it gets wired in to the first couple of uses.
What happens when they get out of sync?
Don't tell me that an event happened; give me the event that happened. (Sure, project it down to something if you like for efficiency; throwing away most of the information is what gives the timestamps example.)
Don't tell me simply whether a user is an admin; tell me what the user is. (That's the enums example.)
Logically speaking, `bool` is equivalent to `Optional<unit>`, and in fact that's frequently what it's used for. Phrased that way, it's much more obvious that this representation doesn't match all that many domains very well; it's clearly useful for performance (because throwing away unnecessary data is a standard performance technique), but it's also clearly a candidate premature optimisation.
The ever growing set of boolean flags seems to be an attractor state for database schemas. Unless you take steps to avoid/prohibit it, people will reach for a single boolean flag for their project/task. Fortunately it's pretty easy to explain why it's bad with a counting argument. e.g. There are this many states with booleans, and this fraction are valid vs. this many with the enum and this fraction are valid. There is no verification, so a misunderstanding is more likely to produce an invalid state than a valid state.
A Boolean is a special, universal case of an enum (or whatever you prefer to call these choice types...) that is semantically valid for many uses.
I'm also an enum fanboy, and agree with the article's examples. It's conclusion of not using booleans because enums are more appropriate in some cases is wrong.
Some cases are good uses of booleans. If you find a Boolean isn't semantically clear, or you need a third variant, then move to an enum.
Nullable helps a lot here but not all languages support that the same way.
Here are two rules I learned from data modelling and APIs many years ago:
1. If you don't do arithmetic on it, it's not a number. ID int columns and foreign keys are excluded from this. But a phone number or an SSN or a employee ID (that is visible to people) should never be a number; and
2. It's almost never a boolean. It's almost always an enum.
Enums are just better. You can't accidentally pass a strong enum into the wrong parameter. Enums can be extended. There's nothing more depressing than seeing:
This goes for returning success from a function too.To be (somewhat facetiously) fair, that's just JSON. The key can be not-present, present but null, or it can have a value. I usually use nested Options for that, not nulls, but it's still annoying to represent.
In Rust I could also do
But then I'd end up reinventing Option semantics, and would need to do a bunch of conversions when interacting with other stuff.I do think its wise to consider when a boolean could be inferred from some other mechanism, but i also use booleans a lot because they are the best solution for many problems. Sure, sometimes what is now a boolean may need to become something later like an enum, and that's fine too. But I would not suggest jumping to those out the gate.
Booleans are good toggles and representatives of 2 states like on/off, public/private. But sometimes an association, or datetime, or field presence can give you more data and said data is more useful to know than a separate attribute.
Depending on the complexity of and user requirements the system, hard-coding roles as an enum could span the spectrum anywhere from a good to a bad idea. It would be a terrible thing if user-define roles were a requirement because an enum can't model a dynamic set of ad-hoc, user-defined groups. The careful and defensive planning for evolution of requirements without over-optimizing, over-engineering, or adding too much extra code is part of the balance that must be made. It could be a good thing if it were a very simple site that just needed to ship ASAP.
Booleans beget more booleans. Once you have one or two argument flags, they tend to proliferate, as programmers try to cram more and more modalities into the same function signature. The set of possible inputs grows with 2^N, but usually not all of them are valid combinations. This is a source of bugs. Again, enums / sum-types solve this because you can make the cardinality of the input space precisely equal to the number of valid inputs.