September 4, 2024

Identifying Cepaeas by the External Traits


Intro

This is a post about identifying the cepaea snails from photos and about all the little bits of knowledge that aren't explicitly recorded elsewhere. It will be long, very detailed, convoluted and it will mostly be of interest to those who already have some experience with Cepaea and ID them a lot. (I will tag some people who may be interested: feel free to add or disagree in the comments. I don't expect most of this info will be new to you, but maybe some of it will. @susanhewitt @sunnysnail @angus10 @mathijs_zonneveld @gural-sverlova)

So you have genus Cepaea and the two species, nemoralis and hortensis. How do you separate them? In principle you're supposed to look at the love arrows:




hortensis




nemoralis


There are other anatomical differences as well. But on INaturalist it isn't going to happen. All you have are external photos. Even outside of context of INat there are times when all you find is an empty shell. Also, dissecting is icky. So can you reliably ID cepaeas at all? Yes, but there are complications. It's not so hopeless that you'd have to throw your hands in the air and say "no dissection = no ID", but also more complicated than just looking at the edge of the opening.


Primary Traits

The first and the most basic thing you learn is that the two species differ by the color of the lip, that thing around the opening. Dark brown for nemoralis, white for hortensis, like this:




nemoralis




hortensis


This is good enough up to a point, but there are going to be problems with it.

The second thing you learn (and the first problem you encounter) is that the lip color is not totally reliable. There are "reverse color cepaeas" - white lipped nemoralis and black lipped hortensis.

For now I'm going to ignore this issue and treat "black/white lipped snail" as interchangeable with "nemoralis/hortensis", but I will return to this issue later in the Reverse Color Cepaeas section, where I will ask: how serious is this problem?

If you ignore the reverse color cepaeas, identifying starts to sounds simple again - you look at the color of the lip and base your ID on that, but unfortunately there are still further complications. If you're not aware of them you're going to badly misidentify lots and lots of specimens.

To start with: what is a lip, anyway? Is a lip simply the edge of the opening? No. A lip is a specific structure, a thickening of the shell that only mature snails develop. If the snail is not mature, there isn't going to be a lip. Then it doesn't matter what color you think you can see on the edge of the opening: it's irrelevant for identification purposes. How do you tell a difference?

This is largely covered in a good post by Susan J. Hewitt. I'm going to quote the most important bits:

"How can you tell if the snail is an adult? In snails of this genus (and in many other land-snail genera), once the snail reaches adulthood/sexual maturity, the shell stops growing any larger, and instead it grows thicker. In particular, the lip (the very edge of the opening of the shell) in adults becomes greatly strengthened, strongly reinforced, and also somewhat out-turned, a bit flared-out. So, in adults the lip of the shell is thick and strong, and it is out-turned to a certain degree."
"A live juvenile or subadult Cepaea snail that is active will always appear to have a white lip on the shell. But what you are seeing is usually the live mantle tissue which is wrapped over the edge of the shell, actively laying down more shell material. That is how the shell increases in size. And any brand-new shell material will also appear whitish, yellowish, or even transparent. This apparent pale lip is not an indication that the shell is mature."

This phenomenon gives rise to the "false hortensis" - immature cepaeas that superficially appear to have a white lip, leading to a very common identification mistake. For example, do you think these are hortensis?

If you were to think that, you'd be wrong - nemoralis all of them. The edge of the opening may appear white here, but it tells you nothing about the species. A lip hasn't developed here. What you a want to see is the opening 1) expanding outwards and 2) thickening, like this:

These are legitimate hortensis, or perhaps you may want to say "cepaeas with legitimate white lips".

What about these ones?

Here you can clearly see the opening expanding outwards. So are these mature hortensis? No, still not. Still not mature, although they are very close. The issue here is that the lip has already expanded but hasn't thickened yet. You can tell from the fact that it's partially translucent (you can see the faint "veins" through it). How else can you tell a difference? Compare a proper, totally mature specimen with a 99%-mature-but-not-quite-there one:




This is a real mature hortensis. You do not see the veins through the shell. Notice a pattern of color change on the lip: first a bright pale strip, then the very edge which seems a bit darker, whish is actually because it's faintly transparent. [observation][ImgLic]




This is an immature cepaea that may well be nemoralis. Notice the veins, notice the other pattern on the lip: first it's dark, (because it's transparent), then the very edge is light (it's also transparent, but here you can see the mantle through it), there's no clear white strip - this is a pattern of a not totally mature cepaea. Notice how this can also be seen in the 3 mature cepaeas vs 3 not totally mature ones above: totally mature ones have a defined white strip, in immature ones you have something darker and muddy looking. [observation][ImgLic]


If you think this represents too much caution, look at the little experiment Angus did:




Look at the snail with bands. The opening is expanded and the edge is white, has to be hortensis, right?




Then, 4 days later look at it again. Oh no.




4 more days after that and it's obviously nemoralis.[Link]


When you have a very well-observed area with many thousands of cepaea photos where only nemoralis lives (eg. New York) out of many thousands one or two snails will be exactly, on the very edge of adulthood and look very, very convincingly like hortensis, but there will be only a couple of them. In cases like that, I think it's good to be extra strict and classify everything, except the completely and entirely convincing specimens and nemoralis. When there are many convincing hortensis observations, then the standards can perhaps be lowered a little. This, for the record, is why I think there are zero hortensis in New York city or Washington state.

Photos of "false hortensis" found their way into many places, see for example the Wikipedia media gallery - it's a mess: https://commons.wikimedia.org/wiki/Cepaea_hortensis. Many snails pictured there can't be confidently called hortensis, and some, in fact, may be pretty confidently called nemoralis. (In general Wikipedia image galleries are not well-identified. They don't have anything resembling a robust ID consensus system over there.)

At this point things may be starting to feel too gloomy: there are so many disclaimers about this whole lip thing, and it's so hard to tell - is it properly developed, is it not properly developed. Turned out some seemingly identifiable snails were unidentifiable, after all. Here are some good news though: in order to call a snail "nemoralis" what you want is not a black lip as such, but a presence of the black pigment. And fortunately for the identifiers, the black pigment begins to form on the columella of the shell before it forms on the lip, like this:

When you see something like that, it's safe to say you are holding nemoralis then. Knowing this can save many observations of juveniles from being unidentifiable. Of course, the absence of black pigment does not imply hortensis (except very weekly). On a young juvenile there will be nothing in either case.

Another thing that might cause confusion and mistakes is bleaching. When a snail lives in a very humid environment (and especially after the snail dies) the outer layer of the shell begins to degrade and turn white. Because of this you can find some nemoralis shells with seemingly white lips. In cases like that, try to locate the still intact parts of the shell and base the identification on them, if there are any. Bleaching can be recognized by how it makes the surface matte and chalky in texture.

These all have brown lips, but it's not so easy to see with all the bleaching. In cases like these, as always, it's useful to look closely. When you notice the bleaching on the outer shell, you should adjust your expectations of how dark the lip is going to be - very likely it too, will seem considerably lightened.




Do you think this is hortensis?




It is not.



Secondary Traits

So far we went over the primary identification characteristics - the love arrow cross-section and the lip color. What if a cepaea is juvenile? There's no lip so you can't use that, and there aren't going to be love arrows either and maybe it's a photo, so you couldn't see the love arrows anyway. DNA analysis is always possible, but far too fancy. Is it unidentifiable to the species then?

I think not. Hortensis and nemoralis differ not only in their primary traits, but in many other ways as well, and you can also use these differences for identification. Another reason why being aware of these secondary traits can be useful, is that they can help you spot the potential reverse color cepaeas, the ones where the color of the lip is atypical. If the secondary traits tell you one thing, and the lip color - another, tread carefully.

The more secondary traits you're aware of, the more effectively you can form a gestalt impression, and the more reliable it will be. However, identifying that way should be done cautiously - secondary traits aren't perfectly reliable and aren't always applicable. If they were, you would call them primary traits. Often the presence of a trait can tell you something, while its absence tells you essentially nothing either way.

*1. Cepaea hortensis rarely exhibits any banding pattern other than 12345 or 00000, therefore other patterns are fairly strong evidence of nemoralis. It can happen though - see the wikipedia gallery: https://commons.wikimedia.org/wiki/Cepaea_hortensis
On the other hand, the merging of bands seems more frequent in hortensis, especially the total merge, (12345).

This is what nemoralis banding patterns typically look like:




Ordinary and common 12345, no merging. 00000 is also common. [observation][ImgLic]




003(45), red base tone - a common pattern [observation]




00300, yellow base tone - a very typical nemoralis pattern. [observation] [ImgLic]




12045 - this is not very common, but it illustrates the variety of unusual banding patterns in nemoralis. [observation] [ImgLic]


Hortensis, on the other hand:




Yellow 00000, unbanded. [observation][ImgLic]




And a 12345. This (give or take band merging), and 00000 is what the vast majority of hortensis looks like. [observation] [ImgLic]




Bands have merged, but it's a (12345) all the same. [observation] [ImgLic]




Missing pigmentation, still 12345. Maybe I should say -1, -2, -3, -4, -5. [observation]





Red 00000 is also possible though a little uncommon. [observation][ImgLic]




More band merging, maybe (123)(45). [observation] [ImgLic]




Exceptions are possible: this is probably 10305, not 12345 at all. [observation] [ImgLic]




Sometimes bands become pale and reddish. In nemoralis this is possible, but rare. You should probably not describe this as 0.5, 1, 1.5, 2, 2.5. [observation] [ImgLic]


*2. In particular, the 00300 pattern is very common in nemoralis and especially rare in hortensis. It is so unusual, that when you see a cepaea with 00300 and a white lip I think it's pretty likely to be the reverse color nemoralis, actually. Meaning that the "00300 = nemoralis" correlation may just be more reliable than the "white lip = hortensis" correlation.

There are some photos of snails with white lips and 00300 pattern around:




This one may possibly be legit. [observation][ImgLic]




This one is likely just a white lip nemoralis, misidentified. See the size, the location - over 2 cm, Pyrenees. [observation] [ImgLic]


Meanwhile on INat:




Probably real hortensis. [observation][ImgLic]




Probably real as well. [observation] [ImgLic]




While this one is probably nemoralis. [observation] [ImgLic]




Probably nemoralis, misidentified. [observation] [ImgLic]


There's a number of observations of white-lipped 00300 cepaeas from the West of France, Paris and the Pyrenees that are identified as hortensis but are probably nemoralis in reality.

*3. A combination of the red base tone and the dark bands simply doesn't seem to occur in hortensis at all. It is therefore also very strong nemoralis evidence.




Usually either 00300 [observation][ImgLic]




...or 00345. [observation] [ImgLic]


Hortensis can be red, but if it is, there will be no bands. I'm not sure if there exists any single photo of a white lipped cepaea with black bands on the internet, at all. If it does, I'm fairly sure it's reverse color nemoralis.

*4. Band number 5 being tighter in nemoralis, wider in hortensis - a reliable trait (or not?)
See this post for a discussion: apparently, not everyone agrees with this notion.
There's an idea that hortensis differs from nemoralis by the radius of the 5th band, sort of like this:

Here, I cherrypicked the specimens to make the difference easily visible. How reliable is this trait? Apparently, roughly this reliable:

These are the distributions of the band widths in the dataset that got. You can see that this trait is neither totally reliable nor totally useless.

*5. Nemoralis is bigger. This is a refreshingly simple way to distinguish the two species: they pretty clearly separate by size.

There's not a whole lot of overlap, meaning that almost all nemoralis are bigger than almost all hortensis. Still, there are always exceptional individuals and you often can't estimate the size very well when dealing with photos.

*6. Nemoralis is flatter, hortensis is closer to a ball shape. Ratio of height to width is usually higher in hortensis. That is another simple trait difference. Another subtle and not totally reliable trait, but sometimes it helps. Perhaps this is simply another manifestation of 1.5, the size difference. Maybe bigger cepaeas always tend to be flatter, and small ones - rounder.




On the left - reverse color nemoralis, on the right - hortensis. When the coloration is exactly the same, the eye can appreciate the difference in shape. From H. Zell image gallery.[ImgLic]


*7. Reddish or pink apex usually implies nemoralis. It is rarely seen in hortensis, except when the whole thing is red. Maybe this is just another way to state point 3.

Sometimes the apex of a cepaea shell has a color a little different from the general base tone - brighter and more saturated, like this:




On hortensis it will almost always be yellow, hardly ever red.[observation][ImgLic]




In nemoralis it can be yellow or it can also be red like this. [observation] [ImgLic]




There are exceptions, like this red-ish tipped hortensis, but this is uncommon. [observation] [ImgLic]



Reverse Color Cepaeas

Let's take a look at them, nemoralis first:




Nemoralis from Italy. White lip in nemoralis can go hand in hand with colorless bands. It's unclear whether this one was identified anatomically or not. Nemoralis like this one seems to be common in northern Italy, but very rare in most other places.[observation][ImgLic]




A one from the Pyrenees, likely nemoralis by size and location. [observation] [ImgLic]


Here's a photo of a pair of white lipped nemoralis , these have been identified anatomically. Not much there that would have alerted the identifier except the size and the 5th band.

And now hortensis:




From Lviv. Image comes from this paper (the link seems not to work now) by Gural-Sverlova & Gural. Notice the 5th band.




From H. Zell collection. [observation] [ImgLic]




I collected this one in cape Kolka, a location where brown lipped hortensis are known to occur, though I did not ID that one anatomically. The ID here is based on size and location among other things and is debatable. [observation]


More on INat: https://www.inaturalist.org/observations/161374476

You can see that none of these have a properly brown lip of solid dark color, such as nemoralis typically would have. Here, the color is lighter and you can see the white substance through it. Looking from the outside (from the side of the body whorl, that is) in many case the lip doesn't even look dark at all. This might give identifiers some hope that hortensis like that can be spotted with no dissection, perhaps.

How much of a problem are those, really?

Not that much of a problem, I think. At least, it's mitigated by two facts: 1) you have all the previously mentioned secondary traits that can help you out and alert you when the lip color is pointing the wrong way and 2) reverse color cepaeas seem to mostly be localized to specific areas. In some places they appear to be altogether absent.

Where I am, in Latvia that is, white-lipped nemoralis seems to not exist. I've never found any and there seems to be no specimens in the national museum. Latvia is by no means the only place like that. On INat all the observations of white lipped snails with the secondary nemoralis traits are concentrated in a few areas - Pyrenees, Northern Italy, west of France, and essentially nowhere else. There are almost none in North America. There must be more out there than I know about, going unnoticed, but they can't be common. Any population of white lipped nemoralis would be producing the suspicious white-lipped 00300 snails, and we see very few of those.

Brown lipped hortensis may be more troublesome. There are few good secondary traits indicating hortensis, and so the reverse color specimens are going to be more insidious and harder to sniff out. It's also unclear where they are localized, if anywhere. I don't think they can be common though, because if they were, we'd see lots of "transitional" hortensis with pale pink lips, somewhere between black and white, and we see very few. Likewise, there are probably very few brown lipped hortensis.


Afterword

It was possible to write this and assemble all these photos because people have been good enough to release them under the sufficiently permissive licenses. My life would have been even easier if all the photos were in public domain and then I'd save all those hours that needed to be spent on painstakingly providing attributions. The lesson here is: renounce copyright and make the whole world wealthier with every free photo. Your (c)-s will do you no good. They will, however, add an extra layer of difficulty to writing a (debatably) useful post.

Posted on September 4, 2024 09:45 AM by tasty_y tasty_y | 5 comments | Leave a comment

August 24, 2024

The Fifth Circle

There's an idea that hortensis differs from nemoralis by the radius of the 5th band - wide in hortensis, more tight in nemoralis, sort of like this:

Here, I cherrypicked the specimens to make the difference easily visible. These things vary. But I mean - just look at them:

These aren't cherrypicked and the difference is pretty obviously there. Right?
But no. Apparently certain people think differently. Hannah J. Jackson, Jenny Larsson and Angus Davison (hi @angus10) co-authored this paper here https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.7517 about the positions of bands on cepaea snails and not only did they not replicate the effect, but they even found a significant weak effect in the opposite direction:

What on Earth is that supposed to mean? How? Why?
Whatever. I'll do my own analysis. I'll go to Wikipedia and grab all the cepaea photos from the H. Zell image gallery here and here, all very nice and conveniently photographed in the same positions, and I'll measure this sort of thing:

I'll measure a and b, and then watch how a/b quantity is distributed by species.
(It wasn't immediately obvious to me, but this was a bit dumb and it would have been better to measure the diameter of the ring and the shell instead of trying to measure radii, because it would get rid of the subjectivity of trying to find the center, but I didn't think of that until I already measured a bunch of shells and I wasn't going to redo too much work.)

Anyway, after I did the measuring I got myself this dataset here. I excluded 3 alleged hortensis from Pyrenees that in actuality were probably misidentified nemoralis (spoilers: this move doesn't make much of a difference for final analysis.) This is what I got, an approximation of distribution of the band radius as a fraction of the shell radius by species:

Visibly pretty different and consistent with the idea that hortensis have wider bands. Let's do statistics, to be sure. I was unsure what statistical test would be the best here to reject the null that both distributions have the same mean. There's Welch's t-test that assumes that both distributions are normal and starts with H0 that mean_1 = mean_2 (doesn't assume same variance). The assumption of normality is a bit dubious here. There's Mann–Whitney U test that tests the H0 that distributions are the same, with no assumptions. Let's do both. I pasted data into these calculator things:
https://www.statskingdom.com/150MeanT2uneq.html
https://www.statskingdom.com/170median_mann_whitney.html
and got p = 0.00008822 for the slightly dubious Welch test and p = 0.0001173 for the Mann–Whitney test. Which is pretty much what you'd expect: yeah, the distributions are different, the means aren't the same. Not sure which tests Jackson et al used for the same particular comparison in their paper.

So yeah, the result replicates on photos stolen from Wikipedia. What does it mean? Why did that paper find the opposite result, and highly significant, too?

Some things about the Jackson paper: first, as far as I understand they excluded the shells with band merging. Understandable move if you want to measure band positions, but what if that's where this particular effect was hiding? Maybe it only manifests when the bands merge? Second, a whole lot of cepaeas (over 200) in their dataset came from Pyrenees, and were dead-collected. Were they accurately IDd? Pyrenees are an evil place where you may possibly find white lip nemoralis. Did that distort the results? (Still though - render them significant in the opposite direction?) To be fair - you can also criticize the Zell dataset I based my numbers on - it was never meant to be representative, rather the opposite - it was assembles to showcase various rare banding patterns. You could argue that is distorting things. I don't think so though. Ultimately, more than p values I trust looking at the picture 2. Still curious about the paper!

Posted on August 24, 2024 07:53 PM by tasty_y tasty_y | 0 comments | Leave a comment

July 24, 2023

Snailing Links

Here are all the good links for identifying molluscs that I know (besides the obvious). I'll collect them here mostly for my own convenience, but also for whoever may need them. Sadly, not all the materials are in English. Feel free to post your favorite links in the comments.

Posted on July 24, 2023 06:42 PM by tasty_y tasty_y | 10 comments | Leave a comment

July 9, 2023

Dubia vs Pumila

Splitting clausilia dubia and c. pumila is notoriously hard, and there aren't many good photos on the internets anywhere, especially not of pumila. I'll try to put some helpful info here for my own use and convenience as much as for anybody else reading it.

My own comparison photos:

Photos by Aleksandr Anichtchenko, size not to scale:

Photos from a book by Digna Pilāte, no unambiguous attribution:

My own doodle with a summary, exaggerated:

This looks to be the case where the overall shape matters as well as the opening grooves - pumila seems noticeably longer and club-shaped, this is true.

Posted on July 9, 2023 03:49 PM by tasty_y tasty_y | 1 comment | Leave a comment

June 17, 2023

Amber sea, garnet sand

Didn't find any Rangia cuneata today, but saw plenty of dark red sand on the beach today. It's not a rare, but a temporary phenomena: when the waves come at a certain speed at a certain angle, they drag the sand particles along the bottom, and the more heavy and dense grains are less prone to moving so they accumulate in some places and you get these bruise-colored patches of deposits. It's the same process as the one that was used to extract gold from sand in the past. Garnet happens to be quite dense, so it's one of the minerals that tends to accumulate in the fashion. Here's what it looks like:

I collected some samples and put them under magnification:

Turned out there were lots of grains of all sorts of colors. Pink and orange ones are garnets, I assume, and the occasional green ones - olivine. Colorless grains are ordinary quartz, but what could the really dark ones be? Maybe they came from basalt or something.

Posted on June 17, 2023 03:54 PM by tasty_y tasty_y | 0 comments | Leave a comment

June 8, 2023

Senseless caution, how it could be adressed

Something I think could be improved about the design of INat, as a website.

Problem: what I call "senseless caution", both from the human identifiers and the AI: IDing something as a single-species taxon, instead of that species, for example "genus Elona" instead of "Elona quimperiana", or I think "Class Ginkgoopsida" is a striking example with only Ginkgo biloba in it. There's never a reason to register such IDs and they can always be improved without any loss of accuracy. (Logically this should apply to the single genus families and so on, all single-descendant taxons.)

Solution 1: hardcode AI to straight up never recommend single-species taxons and recommend more precise ones instead. Indeed, many such IDs come from silly automatic recommendations.
(More controversially: to be honest, I'd like to go further and forbid AI to ever suggest "Genus Cornu", "Genus Arianta" and the like. Giant waste of time, that.)

Solution 2: Put an "Improve" button on the ID form (next to "Agree" and "Compare") that will only appear on the senselessly cautious IDs. Pressing it would automatically register an ID of the single species in the taxon that appeared in the original ID. (That is: you press the "Improve" button on a "genus Elona" ID and it automatically adds an "Elona quimperiana" ID.)

Solution 3, low impact: add an icon to the senselessly cautious ID (perhaps a blue exclamation point). It would do nothing at all, except that when you moused over it you'd see a text along the lines of: "This identification could be improved without any loss of accuracy by replacing it with taxon such and such." Purely informative.

Objection: I can't think of any strong objections, but you could say: "what if we replace all "genus Elona" with "Elona quimperiana", and then a new Elona species is described, or a taxanomic change happens. Then a clean-up will be required." I think this is reasonable but in practice much less of an annoyance than the one that senseless caution currently creates.

(Not posting this on the forum)

Posted on June 8, 2023 08:24 AM by tasty_y tasty_y | 0 comments | Leave a comment

May 25, 2023

Questions and Answers with G.

Today I met Malacologist G. and got to see her private collection. (I'll need to be slightly cryptic about this for privacy reasons.)

It was a lot of fun! If I had the opportunity, I could spent the entire day looking at each shell individually, one by one. I got to see the cool local species I haven't found yet and give G. my Physa acuta sample. Got to ask the questions I always wanted to. I should write down all that I learned while it's still fresh in my memory:

1) Q: Is there really Alinda biplicata in Latvia? A: Nah.

2) Q: Is there really Ferrissia californica? A: probably not really, it can't stand sub-zero temperatures, maybe.

3) Q: What's the diference between perpolitas? A: I still can't make much sense of it, will need to look at a lot more pictures.

4) Q: Where can I find Cochlicopa lubricella? A: Check the dry places. Also, it's supposed to be pale when alive?

5) Q: Is there really Monachoides incarnatus in Latvia? A: Probably not really. Didn't even have a sample.

6) Q: Is there Oxyloma sarsii in Latvia? A: Somebody may have found it in a greenhouse.
6.1) Q: But could it be like, totally abundant all over the place and it's just that nobody even bothered to dissect and find out? A: May be!

7) Q: will it be possible to donate my collection to the museum when I'm old and frail? A: No promises or guarantees. Not even with data. (This is depressing. I may just be forced to sell all of it.)

8) Q: Why isn't all the Latvian data on Gbif yet? A: They are working on it. Something-something bureaucracy. Something-something need to hire programmers. (This is depressing. It's overwhelmingly important to get all that on Gbif. Thank goodness my data is in there.)

Never even got to ask about Gastrodontoids.

  • Supposedly there are some Hydrobia species in Latvia that there were no samples of and that would be good to find.
  • G. is in the camp "no, you can't tell Ambersnails apart based on the shells at all, not even s. putris vs genus oxyloma, no, don't even try, it's hopeless". It's true that God created Theodoxus as a gift to malacologists, and Amebersnails - as a punishment for their hubris. Also, apparently you need to prepare them in some weird way before dissecting them, ie. not in alcohol.
  • G. would be interested in Rangia cuneata samples. Oh, I better find some!
  • G. is totally aware that Cochlicopa nitens tends to have an S-shaped columella. That's right, Bernhard! It does! It does have it!
  • The reason I never found gyraulus crista is that crysta is way, way tiny and fragile. I probably missed it a hundred times. Will need to level my dirt-digging skill for this.

We commiserated in that:

  • Pusidium is unbearable pain
  • Gyraulus is somewhat bearable pain.
  • Radix is sadface.
Posted on May 25, 2023 09:20 AM by tasty_y tasty_y | 0 comments | Leave a comment

August 15, 2022

Old drawings

Once upon a time many years ago, I was in a financially unpleasant situation and I hatched a plan: I would paint and sell some pictures (of seashells, because of course). That plan never got even as far as offering anything to anybody, but I did paint some pictures. Shells are fun to paint, and you also get to feel like Ernst Haeckel or some other scholar-illustrator of old.

Can you ID all the shells?

(I don't intend to sell the pictures anylonger, don't get the idea that this is an ad.)

Posted on August 15, 2022 04:14 PM by tasty_y tasty_y | 4 comments | Leave a comment

August 15, 2021

1 is a Large Sample Size, Actually

There's a particular conversation that plays out on the internet about 100 every day. It goes like this:

Person 1: I've noticed that something weird is going on. I have this data, and it's not the way you would normally expect it to be. Strange!

Person 2: Well, how many data points have you got?

Person 1: 20.

Person 2: Ah, well you see this is a very small sample size. We can't draw any conclusions from your data, because there's just too little of it, so more data is needed and we have no reason to think anything strange could be going on just yet.

Person 1: I see! You are very wise: 20 is a very small sample size indeed.

What do you think about this conversation? Is Person 2 very wise and prudent to see number 20 and say it's too small do draw conclusions from?

This conversation plays out over and over, countless times. Person 2 never bothers to say what sample size would be big enough for their liking, never bothers to specify what sort of model they are working under, never bothers to look at the actual data Person 1 is providing and crunch the numbers to find out the p-value. All they knows is that no matter how many data points are presented, you can always look at them and sagely declare that it's too few and there's no reason to suspect anything odd. 20? Too small. 200? Too small. 20000? Still too small.

How big of a sample size do you actually need to tell anything? Turns out, you can't answer that question in vacuum. It completely depends on your assumptions (H0) and on what sort of data you got.

Imagine that you think that the probability to look outside your window and see a polar bear is 1 in a billion. That's your null hypothesis. You conduct one single experiment: you look outside and behold - a polar bear waving its paw at you. What do you say now, do you say: "well, 1 is a very small sample size, nothing strange is going on here, no reason to suspect my assumption was stupid"? No. You say: "under my null hypothesis there's 1 in a billion chance to produce the result I obtained. So the hypothesis can be safely rejected with a high level of confidence. (p=10^-9)". You can perform more experiments, but you don't need to: with this null hypothesis and this data, 1 is a completely sufficient sample size.

Let's look at another example. Somebody hands you a die and tells you it's fair, that is equally likely to land on any number. You roll the die 5 times, and every time it produces 1. So, do you have any reason to think there might be something fishy with it, or must you say that 5 is a small sample size and it doesn't mean anything? Well, do the arithmetic. Under that hypothesis that the die is fair, the probability that it will produce data this skewed is 1/6^4 = 1/216 = 0.004. Certainly you can say that it's still reasonably high, certainly you can roll the die a couple more times if you wish to be more sure. But notice how with the sample size of 5 we can already reject the initial assumption with a 0.4% confidence. This is better that what lots of published papers can claim. Turns out, with this data 5 is a pretty healthy sample size! (It won't always be so: if you rolled the die 10 times and the results were more mixed, perhaps 10 wouldn't be enough).

The silliest real-life example of this that I've seen was when one person recorded the outcomes of the random effect in a video game. They seems weird and unfair to him, and he recorded about 150 of them. He finished the post by declaring that of course it doesn't mean much, after all 150 is a very small sample size. When we worked through the numbers, it turned out that his data (or data at least that weird) had a 1/50000 probability of occurring under the assumption that the game was fair. His sample was more than big enough, and still he insisted that it was probably too small (without crunching any numbers, of course).

Moral of the story: there's no such thing as "small sample size" in isolation from the data and the null hypothesis. Don't go around confidently declaring that some number less than a billion is a "small sample size" without doing the hard work of calculating what number would be sufficient, without looking at the data and formulating your null hypothesis. We learn statistics so that we can notice that something is weird when there is reason to, not so we can dismiss literally anything shown to us as a coincidence and "small sample size".

Posted on August 15, 2021 07:21 PM by tasty_y tasty_y | 0 comments | Leave a comment