Mining

Data Mining in C

33 Less than a minute

Data Mining in C

#Data #Mining

“Tsoding Daily”

References:
– Wikipedia – K-means Clustering –
– Less is More: Parameter-Free Text Classification with Gzip –
–

Chapters:
– 0:00:00 – TBD

source

To see the full content, share this page by clicking one of the buttons below

33 Less than a minute

33 Comments

@TsodingDaily says:
January 2, 2024 at 4:33 pm

Happy New Year everybody!
Reply
@ed9w2in6 says:
January 2, 2024 at 4:33 pm

i see Emacs I like
Reply
@hubstrangers3450 says:
January 2, 2024 at 4:33 pm

Thanks you…. same …peaceful 2024!!!,
Reply
@sanjaux says:
January 2, 2024 at 4:33 pm

NOB_GO_REBUILD_URSELF Technology™
Reply
@shivashankar28 says:
January 2, 2024 at 4:33 pm

Man I love your videos in C, I am starting to love C slowly, my passion in C is increasing due to you, Thanks a lot Tsoding, hope one day I will send minor patches to linux kernel
Reply
@yaksher says:
January 2, 2024 at 4:33 pm

@42:49 You say "Disney lawsuit incoming," but the mouse is in the public domain now. You're safe.
Reply
@semihkaplan says:
January 2, 2024 at 4:33 pm

He uses makefiles for everything but c
Reply
@koktszfung says:
January 2, 2024 at 4:33 pm

The Lloyd’s algorithm can be used to create a mesh called centroidal Voronoi tessellation. I once used it to generate a mesh on a sphere with non uniform density. That would be pretty cool to make and it basically uses the same algorithm as the one you implemented
Reply
@user-pi7mg9hn3j says:
January 2, 2024 at 4:33 pm

happy new year!!
Reply
@CuriousCyclist says:
January 2, 2024 at 4:33 pm

I love your videos buddy. Keep doing what you are doing. You teach well.
Reply
@postmodernist1848 says:
January 2, 2024 at 4:33 pm

Yeah, the samples are actually more dense in the center, because the probability for a point on a vector with a larger magnitude is the same as the one with a shorter magnitude, so you have the same chance to get a point on a large circumference and it's much sparser. It's a generate_cluster() problem, not a rand() problem. You could generate points in a square which is 2 radii in width and height and only pick points that are within the radius to get "uniform" distribution
Reply
@JamesSjaalman says:
January 2, 2024 at 4:33 pm

Simple data set := Iris sepal/petal. (3 clusters, 4 dimensions)
Reply
@VojtaJavora says:
January 2, 2024 at 4:33 pm

Tsoding. Do you know why you are using CLITERAL?
Reply
@NoneNone-ly6xz says:
January 2, 2024 at 4:33 pm

Why doesn't this video show up in my subscriptions page? It showed up in the homepage but not in the subscription page. Am I the only one experiencing this?
Reply
@Vicente75480 says:
January 2, 2024 at 4:33 pm

as seen in a video by mathemanic called "the numerical simulation is not as easy as you think". the phenomenon where the clusters are denser at the center can be fixed by assigning the magnitude equal to the square root of the random variable between 0 and 1. that is (as others have pointed out before) the rate of change of the area of a circle is not constant as you increase the radius, instead it increases with the square of the radius
Reply
@SeishukuS12 says:
January 2, 2024 at 4:33 pm

Micky mouse goes into public domain, and this is what people do with it? 🤔 lol
Reply
@RandomGeometryDashStuff says:
January 2, 2024 at 4:33 pm

44:04 maybe it's denser in center for same reason as if you take same length sticks and place them with ends at one point (4 sticks look like +, 5 sticks look like *), then whole thing's center is dense (biggest wood/air ratio by volume)
Reply
@oscardeits4709 says:
January 2, 2024 at 4:33 pm

If you replace the commas with nulls you can use the c apis directly without the temporary buffer. That way the csv is actually a sequence of null terminated strings. You do need to keep track of the newline and replace thatvwith null aswell
Reply
@anthonygg_ says:
January 2, 2024 at 4:33 pm

How about that guys
Reply
@kaporos says:
January 2, 2024 at 4:33 pm

Hi !
Why are all your utils like nob_da_append/.. not on nobuild github ? is it your "custom" version ? would be really cool to have them there !

Happy new year !!
Reply
@SiiKiiN says:
January 2, 2024 at 4:33 pm

You could visualize the high dimensional data by running pca two reduce the dimensions. In your case you can do pca of dimension 2 and what you would obtain is a 2 dimensional vector where the 2 values have the largest “explained variance” this basically means that those 2 features contribute to the variance in the data more than any other 2 features.

You would be able to do clustering in the high dimension and just display using the pca.
Reply
@RandomGeometryDashStuff says:
January 2, 2024 at 4:33 pm

36:10 best resolution random: ((double)(unsigned int)GetRandomValue(-2147483648,2147483647))/4294967295.0
Reply
@obsidianhead says:
January 2, 2024 at 4:33 pm

Is a group if react developers a degenerate cluster?
Reply
@dionsolang7296 says:
January 2, 2024 at 4:33 pm

The reason why the samples are denser in the center is because you were generating it by randomizing the magnitude then the angles. randomizing the magnitude will give you the probability of the samples lay within r*mag to the center of the circle. However, the area in which those samples can be placed (pi*(r*mag)**2) doesn't grow at the same rate as r*mag. thus, the greater mag is, the lower the density. Bertrand paradox illustrates this phenomenon really well.
Reply
@viacheslavprokopev8192 says:
January 2, 2024 at 4:33 pm

But they use k-nearest neighbours algorithm in a paper, no k-means
Reply
@stephaneduhamel7706 says:
January 2, 2024 at 4:33 pm

Rand is uniform, but to scatter points uniformly on a disc, you need to use mag=sqrtf(rand_float()).
Because otherwise, you will have the same amount of points per magnitude value(on average), which means points near the center will be closer to each other.
Reply
@albtein says:
January 2, 2024 at 4:33 pm

Do you have some tip for people who really wanna learn C? I don't see many good courses out there or even tutorials with development patterns.
Reply
@gargleblasta says:
January 2, 2024 at 4:33 pm

Your uni truly must have been shit if they didn't even went over k-means clustering
Reply
@stintaa says:
January 2, 2024 at 4:33 pm

Loved this stream <3
Reply
@spacewad8745 says:
January 2, 2024 at 4:33 pm

happy new year folks. great way to start the new year with a tsoding video 🎉
Reply
@aetherialKilix says:
January 2, 2024 at 4:33 pm

i love watching these vods while writing code myself, although your projects are consistently more interesting than mine.
Reply
@biniyam106 says:
January 2, 2024 at 4:33 pm

first
Reply
@labsendeyshent says:
January 2, 2024 at 4:33 pm

Pog! New zozin video just dropped
Reply

Data Mining in C

“Tsoding Daily”

To see the full content, share this page by clicking one of the buttons below

Related Articles

How To Mine VARSECHAIN (VARSE) | GPU & CPU Mining

Asteroid Mining: Is The Next Gold Rush In Space?

Bitcoin Mining and MSM Propaganda

This mining rig now makes over 50$ a day | should you

33 Comments

Leave a ReplyCancel reply