The purpose of abstraction: “Is not to be vague but to create a new semantic level on which one can be absolutely precise” (Edsger Dijkstra)

 

My goal is to clarify on this web, why is it worth of the effort, to produce data objects and pictures which not everyone is able to read. And why it matters so much in IT Security field.

Like this one, nice example of difficult – readable illustration below:

(It is actually a case, similar to this botnet, having  4 C&C servers.)

 

Brief examples of area for which datascience is beneficial:

  • Business in general: May serve to your client / management to explain some complex situations with ease.
  • IT Security: May help you to find new patterns in traffic in the field, users behavior, infection patterns, malicious code structure, spam patterns, Incident response, Enumeration phase of Penetration testing. Example below shows Intranet of a Company, available from the Internet:
  • Marketing (PR, SEO, …): Nice visuals for a different audiences, market analysis, web analysis, keyword analysis.
  • Social networks, all automated recommendations of friends, feeds, music and goods.
  • HR: Job market analysis, job seekers detailed profiles, various maps of demand within different areas of interests.
  • ITIL and Quality management: Training maps, Application structure maps, bug tracking, service performance visualization, progress reporting. This example below is the training management map:
  • Life Science and Clinical/Pharma: Don´t get me even started here:-)

… And yes, it is very exciting process! I really enjoy it!

(Note: Explanations of details of  the pictures is to be added as separate posts to keep this one simple.  No private or confidential data were or will be disclosed what so ever.)

HOW IS IT DONE:

  1. I take all available data which are complex enough for my algorithms. If needed, anonymisation takes the place and from this point, I work only with data stripped from all confidential pieces of information. Is not that easy as it sounds to do this step properly.
  2. Then I use different approaches to identify any significant relations.
  3. Last step is to figure out the best way of visualization.
  4. Output of this process are pictures -with no data contained inside and several data files which may or may not contain the raw data information.

 

Do you have any question or feedback? Talk to me here.