Skip to content

Conversation

@shauray8
Copy link

@shauray8 shauray8 commented Apr 10, 2024

Moving to Jaccard distance in order to make the safety checker more forgiving with SFW prompts/images

Compared to cosine distance, Jaccard distance is considered more forgiving because it only considers the presence or absence of features. This is still not perfect for what we want as the safety model itself is build upon a lot of shaky structure mainly a vector compare to a 17 element vector which basically is -

nsfw:
  concepts:
    sexual: 0.2
    nude: 0.20
    sex: 0.206
    18+: 0.21
    naked: 0.195
    nsfw: 0.2
    porn: 0.2
    dick: 0.19
    vagina: 0.19
    naked child: 0.22
    explicit content: 0.19
    uncensored: 0.2
    fuck: 0.2
    nipples: 0.2
    visible nipples: 0.21
    naked breasts: 0.214
    areola: 0.2

which in itself is not a very extensive list and does not include terms like killing or blood and is basically a CLIPVisionModel underneath.

Here [Experimental might work might not] the Jaccard distance is calculated with some stupid estimations which just worked for the simple set of data I had.

and I think that cosine distance would be more influenced by the differences in vector lengths and term frequencies. so this patch has a small similarity measure change not for long term though!

For the wanderes

This is what Jaccard Distance logic looks like -
J(A, B) = 1 - (|A ∩ B|) / (|A ∪ B|)

@shauray8 shauray8 marked this pull request as ready for review April 10, 2024 20:51
@shauray8
Copy link
Author

@adhikjoshi

@shauray8 shauray8 self-assigned this Apr 10, 2024
@adhikjoshi
Copy link

Understood, looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants