Understanding brain imaging data – 65,000 shades of gray.

I wanted to explain how structural and functional MRI image format NIFTI works. NIFTI stands for Neuroimaging Informatics Technology Initiative. It is currently a 8 to 128 complete bit, signed or unsigned integer data storage format. The most common implementation of NIFTI is the 16 bit signed integer data storage. The NIFTI images usually have the file extensions *.nii or (*.hdr and *.img) pairs. This image format is important due to two reasons: 1) The format stores the spatio-temporal imaging details and 2) compression to allow better space management.

Anyone who underwent a brain scan knows, the picture of the brain from an MRI or CT scanner is usually a grayscale picture with 65,536 shades of gray. The raw files from the scanner are usually DICOM (Digital Imaging and Communications in Medicine) format with an extension of *.dcm. The DICOM format is similar to the RAW image format for cameras. Instead of pixel read out stored in RAW images, the DICOM images store scanner readouts.

Each scan of a subject usually contains several DICOM files. This is an advantage and a disadvantage. For sharing specific image slices, DICOM is therefore extremely useful. But, for most interpretation purposes, the analysis often requires full image sets. A few slices from the scanner becomes less useful. This is where NIFTI format comes to rescue.

Since the format stores the entire sequence in a single file, the issues of managing large number of files are eliminated. The interpretation of a specific image based on the image preceding and succeeding it becomes easier. This is due to the ordered arrangement of images.

There is another important advantage of NIFTI. Brain imaging data is most relevant from an analytical point of view, to be used as a 3D data structure. Even though the individual components of the NIFTI are 2D images, the interpretation of an image becomes more reproducible if we treat them as 3D images. For this purpose, the NIFTI format is the best format to work with.

An example is the use of a machine learning tool called 3D convolutional neural networks (cnn). 3dcnn’s provide the 3d spatial context to a voxel. For image sequences like brain scans, identification of various structures or any abnormalities require the 3d spatial context of a voxel. The 3d cnn approach is very similar to looking at a video and trying to identify what the scene is about. Instead of using it for video scene recognition, 3d cnn can be used to train and detect specific features in a brain scan.

This work is done as part of our startup project nanoveda. For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

Artificial Intelligence Cancer Cloud computing Deep learning Doctor Healthcare Nanoveda Neuroscience Open source R Software Virtualization

Virtualization – Matryoshka dolls of computing.

A few weeks back I talked about various open operating systems to efficiently run some of the deep learning and simulation models. I switched back and forth between six different flavors of linux to finally settle with one. This experimentation phase is helpful in the long-run.

But, for folks who want to run one particular toolkit in the convenience of their preferred operating system environment, there is an alternate option. It is virtualization, and one software particularly: Docker.

Virtualization is the computing equivalent of Matryoshka dolls. A host computer can have multiple operating systems running inside it, or one can nest virtual machine within a virtual machine within a physical machine. This layering approach to operating systems have made software applications somewhat platform agnostic.

I love Docker in Windows 10. The caveat is, the OS has to be 64 bit and the processor should support specific extensions that allow hardware level virtualization. These extensions are often referred to as x86-VT-x. Docker prefers Microsoft Hyper-V, to run its linux virtual machines inside windows. In systems that may not meet these requirements, it is possible to force Docker to use VirtualBox’s implementation of virtual machines inside Windows.

The backbone of modern internet applications including cloud computing applications are based on virtualization. With the advent of hardware extensions supporting virtual machines, the performance difference between a physical machine and a well configured virtual machine in a right host is non-existent.

For new-comers to linux, who are still more comfortable dealing with off-the-shelf consumer hardware, virtualization is an easy entry point to start using some of the awesome tools for deep learning and computational simulations. But, once you learn the tricks of the trade, there is always the option to convert all your applications to a physical machine or run it inside a cloud provider.

This work is done as part of our startup project nanoveda. For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

(The image of Docker whale logo is from Docker blog)

Doctor Doctors without borders Healthcare Management Médecins Sans Frontières Non profit

Democracy and science – What South Sudan teaches us?

I had an incredible opportunity to participate in a Doctors without borders (MSF) initiative to fill-in missing geographic information data into satellite images. This is an incredibly important process to figure out a huge number of operational details for aid agencies and non-profit organizations like MSF.

This includes resource allocation, rapid disaster relief, quick response to public health crisis like outbreak of epidemics, administration of vaccines to children and so many other important life saving efforts. Our mission for the day was to help fill-in housing details in a region called Aweil in South Sudan.

I consider MSF as one of the most important organizations in the world. In 1999, when MSF won a Nobel prize for peace, as a teenager hoping to find what should be my next mission in life, reading about some of the incredible life saving efforts of MSF acted as one of the strong motivators for me to become a doctor.

But, yesterday’s mission of helping fill-in the missing mapping information in South Sudan made me realize another important fact. The importance of investing in transparent democracy, science and technology even in resource poor settings.

According to 2013 World Bank data South Sudan has a per-capita GDP of $1044.99. The country has a steady source of revenue by exporting oil, which accounts for nearly 40% of the GDP. This gives South Sudan another important distinction: the most oil dependent economy in the world.

At a time when most nations around the world are pledging to invest more in renewable energy and as the global economy is shifting away from oil, how can a young upstart like South Sudan cope with these changes? The secret is: early investments in science and technology.

Despite robust oil revenues, due to systemic inefficiencies in the South Sudanese economy, most of it never benefit the citizens of this North African nation. Even today, due to these inefficiencies, the oil revenue is a non-existent contributor of economic development in South Sudan.

The financial infrastructure in South Sudan is virtually non-existent and the military acts as a bank to distribute currency to the public. This creates a huge conflict of interest. Due to the lack of financial transparency in how the country’s revenues are handled, most of the cash distribution system fail to address the poverty and social issues that riddle South Sudanese society.

At a time when democracies around the world are racing towards reinventing themselves as opaque, protectionist and self-serving institutions; this little North African nation serves as a warning beacon against such policies.

Despite all the challenges, I see hope for a small country like South Sudan. With a little bit of external help and guidance, the democratic and financial institutions can be made more efficient. The much needed access to healthcare is currently provided by brilliant organizations like MSF. But, developing local skills and training will be extremely important for South Sudanese society to flourish and be healthy.

There is an incredible opportunity for this nation to invest in good educational infrastructure. This will create more empowered citizens, a much needed resource for a fairly new country. Investment in education is a necessity for a healthy democracy.

Another key area that needs investment is the development of technological backbone to support independent public and private financial institutions. This will create a more accountable economy, reducing financial inefficiencies.

These are very hard tasks, even for developed economies. But, investing in these key basic goals will elevate South Sudan from a languishing new democracy to a beacon of hope, it once was, a few years ago.

Read more about MSF activities in South Sudan. Here is a short movie explaining how a day in Aweil is like for members of MSF.

Please consider making a donation to Médecins Sans Frontières (Doctors without borders) to help MSF continue their incredible work of bringing accessible healthcare to some of the poorest societies in the world. Organizations like MSF are the epitome of hope and inspiration for free societies around the world.

Donate here: through MSF website page for donations.

(The picture of a mother with a new born taken at Aweil, Médecins Sans Frontières hospital, South Sudan, retrieved from MSF UK website, © Diana Zeyneb Alhindawi.)

Artificial Intelligence Healthcare Machine Learning Nanotechnology Nanoveda Neuroscience Open source Parallel computing Parallel computing Programming R Software

Linux distros – The art of selecting one.

I have decided to migrate all of the programming environments to linux . The reason is simplicity of linux to run Python and R. I am often befuddled by common dependency issues, which linux seems to have avoided. This is especially true for Python. An added advantage is the ability to run very sophisticated deep learning algorithms, including Nvidia Digits and Google Tensorflow. If one is serious about machine learning, embrace linux.

Right now, I am divided between two major linux distros: Ubuntu and Fedora. Ubuntu has the advantage of wider support. Fedora has Project Wayland, which makes Fedora 24 way more secure than any other X-system based linux distros. For now, the decision is a virtual tie. Right now, I am experimenting with a very minimalist Ubuntu based OS, called Elementary.

I initially ran three tensorflow experiments in Elementary. The OS has major issues with Anaconda and Docker. But, since I don’t care much about either of those issues, as long as tensorflow experiments could perform well, I was very happy. The chief attraction of Elementary OS was the distraction free, minimalist UI. Being a Sherlock fan, the name was also a subjective matter of attraction to me.

After two days of experimentation with Elementary, I decided to stick with plain vanilla Ubuntu 16.10. The biggest issue for me was the lack of a stable package manager. A simple Docker installation routine broke the package manager. Then came the errors in tensorflow. The UI is beautiful in Elementary. It is very beginner friendly distro. But, for advanced applications, I have decided to stick with Ubuntu.

This work is done as part of our startup project nanoveda. For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

(The image of  Elementary OS Loki 0.4 desktop from Elementary OS blog.)

Artificial Intelligence Machine Learning Management Open source Programming R

Quantifying trust – An essential necessity for any business.

This post is an evolving set of ideas to quantify trust in decision making systems and processes. For me, an empirical definition of trust is the relative distance of a decision from the ground truth. Quantifying and optimizing trust in decision making systems are therefore highly important. This process will make systems that are involved in decision making processes to perform consistently and closer to the future reality.

The first step to optimizing such systems, human or computational, will be to develop an algorithmic approach to quantify and optimize trust.

The first experimentation is using a measurement of distance from the center. The idea here is, as overall trustworthiness of a decision making system improves overtime, the system has a very short distance from the mean. Also, patterns of delineation between systems that consistently lag behind in real world prediction problems can be easily identified.

The code above is an example I pulled from CRAN for starting an experimentation with k-centroids cluster analysis.

Another approach to quantification of decision systems is to use log-loss. Log-loss is very interesting because of the increased penalty for systems that are very far off from the ground reality.

Here is a simple implementation of the log-loss function. But this function has a series of downsides, which I will discuss below.

The function log(1-predicted), is the function I am wary of. What if the algorithm used for making predictions return a value greater than 1? For most applications, a simple specification of the range between 0 and 1 for the prediction values will fix the issue of >1 values. But, there are circumstances where >1 values are needed as outputs for prediction problems. An excellent scenario is regression problems using machine learning.

In regression problems, there is no real solution to identify whether a probability function returning a value slightly higher than 1, when plugged into another equation, matches the real observation or not. To address this issue of handling >1 probability values, I have modified the code to include an absolute function. This will prevent the log function to return imaginary (i) values or in most programming environments: NaN values. The modified code is included below:

The quantification of trust in decision processes, especially the artificial intelligence systems are important. I visualize AI systems as very similar to constructing a bridge across a deep ravine with a river flowing at break-neck speeds.

If someone builds a rickety rope bridge (very low trust scores), people have the intuition to not use the bridge to cross the ravine. On the other hand, when we build a strong steel suspension bridge with a service lifespan of 300 years and a load carrying capacity way higher than anything currently imaginable (very high trust scores), folks will use the bridge without ever thinking about the risks. The reason is quite simple: the statistical probability of the well engineered steel suspension bridge failing is very close to zero.

But, the problem for AI systems currently is: there is no straight forward and intuitive solutions to quantify the trust worthiness of these systems. The metrics that I am trying to develop, will help visualize and quantitate the trust worthiness of AI systems. It is very similar to human cognitive approach to the bridge crossing problem, but applied for AI and decision systems.

Note: Evolving content with changes in the post as I add more content.

This work is done as part of our startup project nanoveda. For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

(The image of “Mother and Child, 1921” by Pablo Picasso, Spanish, worked in France, 1881–1973, from Art Institute of Chicago and published under fair use rights.
© 2016 Estate of Pablo Picasso / Artists Rights Society (ARS), New York, 

The image of island rope bridge, Sa Pa, Vietnam, is an edited version of a public domain photograph obtained through Google image search. )

Cancer Healthcare Nanotechnology Programming

How to think like scientists – A five step process.

Richard Buckland, a professor of computer science, has a very elegant explanation on how to think like a scientist. Even though these steps are aimed at his computer science students, for me this has more general meaning and purpose. Richard’s simple five step process of getting into a scientific mindset are:

  1. The moment you encounter an interesting problem, always try to solve it and never give up.
  2. While solving the problem, always ask more questions.
  3. To make the process of solving problems easier, break it into different parts.
  4. Profit from your path to solving the problem, and from the solutions you created.
  5. Look for more problems to solve.

I love this five step process because it summarizes concisely the thought process of a scientist. What we are doing at nanoveda with nanotechnology and cancer research is essentially a real world embodiment of this five step process. As a group, when we first met, we found a highly disturbing problem of accessible, precision cancer therapy. This problem has a very personal meaning to us at nanoveda.

Since each of us had our own unique expertise, we broke the problem down into components and came up with a solution. We didn’t stop there. We asked more questions like: how to improve the process, how to gather more data and information, how else can we solve this problem, who else can benefit from this work. Then we started looking for more problems to solve.

For providing benefits to a broader cross section of the society, we decided to form a company and scale-up our idea. Up until recently, we never formally addressed how our process worked. Then, I discovered this video of Richard and his five step process. It made me realize: this is exactly what we were doing subconsciously, all along the way.

I am a huge admirer of the course that Richard teaches. He teaches this course at University of New South Wales, Sydney, Australia. I am posting this video below, where he describes the five steps outlined above.

This video is part of a lecture series from his course on computing: Computing 1 – The Art of Programming. The course is also available online. I highly encourage everyone to check this video out and the entire course too. It will help everyone start thinking like a scientist.

For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

(The illustration of the structure of deoxyribonucleic acid (DNA) is from Wikipedia.)

Healthcare Nanoveda Research and development

If I were a time traveler – Lessons learned.

If I were a time traveler, I would have traveled back exactly 121 years to witness one of the most important discoveries of modern medicine: x-ray. The photograph above on the left is the first ever recorded x-ray photograph of human anatomy and on the right is the photo of Wilhelm Röntgen, the man who took this picture. The hand in the picture is thought to depict Röntgen’s wife Bertha wearing her wedding ring.

X-ray was recognized back then as an unexplained physical phenomenon. Sir William Crooke noticed an odd phenomenon of blurry photographic papers that he used to wrap the vacuum tubes he was studying. No one could exactly explain this phenomenon well enough. This was despite Hermann von Helmholtz’s formulation of mathematical equations to predict the existence of x-ray without ever experimenting with them.

With ‘Hand mit Ringen’ photograph, Röntgen demonstrated unequivocally, the unique property of x-ray to travel through structures that were other wise thought to be impenetrable. This discovery started an important era in modern medicine: one that belongs to diagnostic medicine. X-ray is in many ways, one of the earliest diagnostic tools. The physics behind x-rays was difficult to understand, but a simple picture of the bones of the hand captivated the imagination of the scientists as well as the general public. Immediately after the publication of  ‘Hand mit Ringen’ photograph, there was a huge uptick in number of scientific publications dealing with these mysterious and powerful light sources. Even the general public weren’t immune to the charm of the mysterious x-rays. Among the general public, x-rays were often considered as magical phenomenon bordering on the paranormal.

An important lesson I would have learned from my time travelling adventure: the importance of capturing public’s imagination. As the old adage goes, a photograph is worth more than a thousand words. Surprisingly, modern medicine has a lot of work to do, when it comes to communicating some of its incredible feats to the public: clearly, eloquently and without distorting the facts.

In an age where hyperbole and mythical fantasies dominate the news cycle, a simple time travelling thought experiment would reveal the exact same events happening to us as human beings, at every juncture in our history. For individuals who understand the physical world better than most, we have a responsibility to convey our ideas, messages and findings; with great conviction and confidence.

Another lesson I would have learned from my journey to the pages of history is the importance of understanding the physical phenomenon to master the biological world. We often venture into experimentation and trial & error with very little grasp of some of the underlying phenomenon. In this day and age of powerful computing, development of simulation models to test the physical theories of interactions are easier and yet ever more important to create sophisticated biological experiments. Simulations and mathematical models often help us solve problems faster, despite the initial sweat and elbow grease associated with it.

To create the next generation cure for cancer, I am confident that the first step is to build simulation and mathematical models of how some of these sophisticated drugs work. As part of nanoveda’s ongoing attempts to advance cancer therapy, we are using advanced simulation tools to build models to predict how some of the drugs we have designed will work.

For continuing this work, we are running a crowdfunding campaign using gofundme‘s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

Cancer Healthcare Nanotechnology Nanoveda Neuroscience Open source R Research and development

What is next – The future of research.

Every year, from 2010 onward, Redmonk publishes a bi-annual comparison of the popularity of programming languages relative to one another using data from GitHub and Stack Overflow. One list is compiled for the summer and another one for the spring. Among top 15 programming languages in the spring 2016 list, only one exclusively scientific and statistical programming language is featured. It is: R.

Within a six month period: between January of 2016 and June of 2016, R climbed from its original 13th place to the current 12th place. The only other popular scientific and statistical programming language ranked higher than R is Python. But, mind you, Python is a general purpose programming language, with broader programming applications than the R’s primary focus of scientific and statistical computations.

I see this list as a bellwether or a canary in the coal mine or a harbinger. The message is clear. Since R and Python are opensource programming languages, any startup with a heavy focus on research and development should start investing in open platforms now. It will provide two benefits: 1) number of application developers, packages, platforms and programmers working with the language will be high; 2) the relative stability over time for the list of names featured in the top 15 programming languages indicate a very easy way to future proof the research and development side of the business.

Even if the underpinning technologies are opensource, there are mechanisms to protect and commercialize the intellectual property. If there is not much competition in the field of development, or if the resources and management strategy allows focusing more on: 1) cascading effect of lowered supply chain costs due to increased competition and  2) maintenance of developer momentum; then one can always choose to take the path of open sourcing every new research and development. A great example here is the electric car manufacturer Tesla.

In conclusion, open platforms are the next big bet for research and development focused startups like nanoveda. Our strategy for technology development is based on what the technology landscape will be for the medium term future of next 5 to 10 years. Cancer research is a very competitive field and monetization potential of startups in this field are measured using intellectual property. Once we reach a position of self sustaining business, nanoveda will opensource all our patents, similar to Elon Musk’s approach with Tesla.

Right now we need your support. We are running a crowdfunding campaign using gofundme‘s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.


Artificial Intelligence Cancer Cloud computing Healthcare Machine Learning Management Nanotechnology Neuroscience Open source Parallel computing Programming R R Resource management Software Uncategorized

Writing better code – Parallelize

Today, I am going to share a secret recipe for writing beautiful and efficient code that I learned, while creating simulation models for nanoveda.

Nanoveda is using advanced nanoscale simulations to design next generation cancer therapeutics.

The secret recipe is: parallelizing code.

Most modern PCs have a multicore processor inside it. We seldom code to exploit all the cores. The greatest advantage of coding for parallel processing is: getting things done faster. This is possible by utilizing all the available resources in your computer. The coding principles for parallel computing described here are applicable to cloud computing also. While using cloud computing infrastructure, we often buy virtual machines with multiple processing cores and seldom use all the processing power we paid for, to run the codes. Coding for parallel processing allows efficient use of available computing resources.

Here, I am using an example in R to demonstrate how to write a program for parallel processing.

The code above uses 2 cores to run the function: ‘some.function’. This was achieved using doParallel and foreach packages.

To utilize all the available cores for parallel processing, modify the doParallel parameters to:

Parallel processing has another huge advantage. Very lengthy and complex codes can be executed with limited resources without overwhelming the system. Parallel processing is one of the key steps in code optimization. Parallelized code facilitate efficient running of massive simulation models and machine learning codes. To take advantage of hardware solutions such as Xeon Phi require this skill to code for multiple processing cores.

Nanoveda has a crowdfunding campaign to support the awesome work to create next generation cancer therapy. Donation, or not, share our crowdfunding campaign and help spread the word.

Cancer Healthcare Nanotechnology Nanoveda Neuroscience Open source Software Uncategorized

Innovation at its core – Making nanoscience accessible.

In this blog post, I will detail two key philosophies that are behind nanoveda.

LEAP philosophy:

Most important guiding philosophy of nanoveda is very simple: bringing nano-science to masses. The first product we are developing is to address one of the toughest challenges known to human-kind: controlling and curing cancer.

Achieving this goal is a step-wise process. My four steps to achieve this goal are:

  1. Learn.
  2. Experiment.
  3. Analyse and adapt.
  4. Predict.

These four steps or the LEAP philosophy is pivotal to bringing success to any processes and summarizes my pathway to accelerate innovation at nanoveda.

Radical openness:

Another key philosophy at nanoveda is: radical openness. To control processes at the level  of nano-scale, nanoveda needs cutting edge tools. As a startup, the cost of researching and developing these tools are beyond our current capabilities. Also, the new learning and knowledge that we gain during nanoveda’s step-wise progress, needs to be implemented quickly in our tools. To achieve all this, we are embracing the philosophy of radical openness. An example of radical openness at work, is our love of open-source software. Open systems power our toolkit for predictive analytics of nano-scale systems. Another revolutionary toolkit is helping nanoveda build simulation models of nano-scale drugs.

Radical openness is an important driving force of nanoveda’s quest to succeed in building a next generation nano-sciences company.

Nanoveda has a crowdfunding campaign to support the awesome work to create next generation cancer therapy. Donation, or not, share our crowdfunding campaign and help spread the word.