πππππππ
I would be lying if I said that coding came to me easily. Itβs been years of stepping back to square one, but itβs something with which I have pursued with continued tenacity because the thrill of getting something to work makes you forget the frustration of debugging.
Iβve managed to stick with Python as a programming language the longest because it suits my brain to learn a tool that can worth with visuals and mathematical functions. As I am continuing with the weekly Dear Data project, I find that I am drawn to unstructured data or metrics that captures qualitative information.
As usual, I aimed a little higher and stumble along the way. It's within these stumbling blocks, however, that the exploration happens. This week's project began with modeling my usage of text messages. I wanted to quantify how I was communicating using emojis β the frequency of usage, the time, length of messages, etc. in order to look at how effectively emojis communicate emotions.
That's enough questions to start a PhD. I've got seven days.
Lesson β 1: Not all unicodes are equal
Emojis varies vastly across operating systems, your iOS hamburger has a different stacking order than your Android burger. They could be have the same unicode but different IDE will read them differently. (I will have to look into this further.) Codes with emojis that have started in Drawbot will not work in Sublime and vice versa, even when running on the same Python version.
Not only will some emojis not look differently β Android used to convert and iOS cookie emoji into a cracker β but some are simply non existent across platform. π₯³ Party hat emoji, not universal. (I haven't even gotten into browsers yet.)
Lesson β 2: The effort may not be worth it (but you'll learn something)
After all this time testing and debugging to get my script to count emojis, the results were not that interesting. The emojis themselves were pretty expressive, for me to made a data viz drawing of it feels like it is flattening the information. Some of the more boring data I've collected in previous weeks yielded much more interesting visual. Counting emoji resulted in a not so complex data set.
Lesson β 3: It never hurts to review the fundamentals
I started looking at what other information I could extract from these messages, and how can I show something interesting without leaking privacy. One of the first thing I learned with the Hear Me Code group in DC is how to measure the length of a string. That way, I can start to see patterns of when and how long the messages were. (Exchanges with my sister read like novelas.) Responses with emojis were sufficiently short.
As with most things difficult, I made new discoveries along the ways. I had a chance to practice cleaning data. Extracting extraneous punctuations and spaces, and setting everything in lowercase because Python is case sensitive, so it will considers words at the beginning of a sentence a different piece of string than the same word in the middle of a sentence. In thinking what else this could be relevant for, I set out to write a script to import and read a text file, parse the string into separate words, and return the most used words! The most used word was "the". π€¦ββοΈ
Solutions begets more problems, but I trekked through reviewing split and remove functions, loops and counters. Hooray! My top used words are "type," design" and "otherskillschemistry."
Another day....