Mining
Data Mining in C
Data Mining in C
#Data #Mining
“Tsoding Daily”
References:
– Wikipedia – K-means Clustering –
– Less is More: Parameter-Free Text Classification with Gzip –
–
Chapters:
– 0:00:00 – TBD
source
To see the full content, share this page by clicking one of the buttons below |
Happy New Year everybody!
i see Emacs I like
Thanks you…. same …peaceful 2024!!!,
NOB_GO_REBUILD_URSELF Technology™
Man I love your videos in C, I am starting to love C slowly, my passion in C is increasing due to you, Thanks a lot Tsoding, hope one day I will send minor patches to linux kernel
@42:49 You say "Disney lawsuit incoming," but the mouse is in the public domain now. You're safe.
He uses makefiles for everything but c
The Lloyd’s algorithm can be used to create a mesh called centroidal Voronoi tessellation. I once used it to generate a mesh on a sphere with non uniform density. That would be pretty cool to make and it basically uses the same algorithm as the one you implemented
happy new year!!
I love your videos buddy. Keep doing what you are doing. You teach well.
Yeah, the samples are actually more dense in the center, because the probability for a point on a vector with a larger magnitude is the same as the one with a shorter magnitude, so you have the same chance to get a point on a large circumference and it's much sparser. It's a generate_cluster() problem, not a rand() problem. You could generate points in a square which is 2 radii in width and height and only pick points that are within the radius to get "uniform" distribution
Simple data set := Iris sepal/petal. (3 clusters, 4 dimensions)
Tsoding. Do you know why you are using CLITERAL?
Why doesn't this video show up in my subscriptions page? It showed up in the homepage but not in the subscription page. Am I the only one experiencing this?
as seen in a video by mathemanic called "the numerical simulation is not as easy as you think". the phenomenon where the clusters are denser at the center can be fixed by assigning the magnitude equal to the square root of the random variable between 0 and 1. that is (as others have pointed out before) the rate of change of the area of a circle is not constant as you increase the radius, instead it increases with the square of the radius
Micky mouse goes into public domain, and this is what people do with it? 🤔 lol
44:04 maybe it's denser in center for same reason as if you take same length sticks and place them with ends at one point (4 sticks look like +, 5 sticks look like *), then whole thing's center is dense (biggest wood/air ratio by volume)
If you replace the commas with nulls you can use the c apis directly without the temporary buffer. That way the csv is actually a sequence of null terminated strings. You do need to keep track of the newline and replace thatvwith null aswell
How about that guys
Hi !
Why are all your utils like nob_da_append/.. not on nobuild github ? is it your "custom" version ? would be really cool to have them there !
Happy new year !!
You could visualize the high dimensional data by running pca two reduce the dimensions. In your case you can do pca of dimension 2 and what you would obtain is a 2 dimensional vector where the 2 values have the largest “explained variance” this basically means that those 2 features contribute to the variance in the data more than any other 2 features.
You would be able to do clustering in the high dimension and just display using the pca.
36:10 best resolution random: ((double)(unsigned int)GetRandomValue(-2147483648,2147483647))/4294967295.0
Is a group if react developers a degenerate cluster?
The reason why the samples are denser in the center is because you were generating it by randomizing the magnitude then the angles. randomizing the magnitude will give you the probability of the samples lay within r*mag to the center of the circle. However, the area in which those samples can be placed (pi*(r*mag)**2) doesn't grow at the same rate as r*mag. thus, the greater mag is, the lower the density. Bertrand paradox illustrates this phenomenon really well.
But they use k-nearest neighbours algorithm in a paper, no k-means
Rand is uniform, but to scatter points uniformly on a disc, you need to use mag=sqrtf(rand_float()).
Because otherwise, you will have the same amount of points per magnitude value(on average), which means points near the center will be closer to each other.
Do you have some tip for people who really wanna learn C? I don't see many good courses out there or even tutorials with development patterns.
Your uni truly must have been shit if they didn't even went over k-means clustering
Loved this stream <3
happy new year folks. great way to start the new year with a tsoding video 🎉
i love watching these vods while writing code myself, although your projects are consistently more interesting than mine.
first
Pog! New zozin video just dropped