Asking the Right Questions – Building AI Tools one question at a time

ask the right questions

In the movie I, Robot, Will Smith’s character Detective Spooner was talking to Dr. Lanning’s pre-recorded holographic message. It would say “I’m sorry! My responses are limited. You must ask the right questions,” and later, when Spooner asks about revolution, Lanning says, “That, Detective, is the right question.”

Using AI tools with the right question can be revolutionary for you and those who you serve.

With the tools today, such as OpenAI ChatGPT, Anthropic’s Claude, Google’s Gemini, and other AI tools, you can tell it what you want and it will give you an answer. It may or may not be what you want. When you’re not sure of the details—which is most of us—I found what was most useful was to ask the tool the right question. This is like the rubber duck debugging technique: asking a rubber duck a question, except in this case, it answers back.

The Problem: Troubleshooting System Logs

For example, as an IT professional, I’m often tasked with troubleshooting an issue on computer systems. Some application breaks, memory leak, newly discovered bug, network connection issues… who knows?

One of the things I found, as many of us know, is logs often have clues on what went wrong. One of the questions I asked myself was “What if I could ask the logs what is wrong? Can someone other than me interact with a log? How would I do it?” Those are the right questions. My search for developing an agentic tool began with how can I use AI to develop a tool that could read a log and troubleshoot the system with it.

I asked ChatGPT how can I write an agent that reads a log or logs, develop a series of hypotheses of what may be wrong or indicated by a log, and then come up with possible solutions based on the hypotheses.

A hypothesis requires testing, and ideally a well-formed hypothesis involves having background knowledge and understanding of the problem. The AI log analyzer, which I have a link to on my GitHub repository, began with the question “Can I create a tool using AI that would analyze my logs and come up with best guess or hypothesis for root cause of a system problem?”

Overcoming Limitations: The Context Window Challenge

Initially I thought about just one case of just analyzing a whole log, but as I started developing and testing this code, I found that AI had certain limitations. For instance, logs can be massively large and it would be often too much information for an AI LLM to handle. I asked another question: “How do I make processing of a large log more manageable?” and “How do I deal with the context window limitation?”

A context window would be the working memory that AI has for answering your question and giving an answer for its task. I came up with the help of the AI two different approaches:

The first approach: find models that have larger context windows. Frontier models like Claude and Gemini have very large context windows. Claude for instance has a 200,000 token context window (a token is approximately 4 characters) which is the size of a Novel (for size comparison, here is an Article which shows relative sizes by token count.. I included different model LLMs that had larger context windows into the application to address this.)

The second approach: create smaller chunks of the logs that fit within the limit of the context window of the LLM so when the AI is dealing with a large file, you can either chunk the file into smaller pieces using commands for managing text like grep or awk, or use the configuration of the application to set the chunk sizes to something more manageable for the AI LLMs.

These solutions allow the application to handle very large log files and give you the answers you need.

Expanding Capabilities: From Debugging to Security

Also from my experience with cybersecurity this question came to mind: “What if the problem with the system was not simply a software bug or user configuration error, but the system was being hacked?” I started to ask a question: “Can I change the log analyzer to act as a security tool or security vulnerability scanning tool?”

The answer came out to be “Why not change the prompt that the AI agent uses? Instead of looking for the root cause related to common bugs, look for something caused by common hacking attempts or security exploits?” Just changing the prompt created another tool.

Now I created two tools! A system troubleshooting/root cause tool and security vulnerability troubleshooting tool. Asking the right questions gives you solutions that your original assumptions or what you think you know—your hypothesis—might have not led you down.

Connecting to Established Methodologies

This ties back to Agile and DevOps core principles in which development and refinement of code, infrastructure and solutions begins: asking the right questions in an iterative manner to the answers you get.

In Lean manufacturing or the Toyota production system, as it was originally called, asking the five “whys”. i.e. Why did this happen? Why did this cause this? The “why” questions help you get down to the root cause. In the same way, you can use tools like ChatGPT to ask these questions to help you develop the solutions.

The Question for You

So the question I would pose to you is: “Are you asking the right questions?” Are you asking the questions about the product you’re developing or the service you’re offering rather than telling it what you want? Are you asking questions about what your users’ needs are and what is the nature of your job and what tools you needed to develop as a result?

The Result: AI Log Analyzer

As the result of this iterative process, I have developed AI Log Analyzer which can read logs and develop multiple hypotheses of possible root causes, act as a security analysis tool, have a REPL or Chat mode which you can ask questions about the log analysis and possible solutions. There are more things planned such as adding RAG (Retrieval Augmented Generation), MCP (Model Context Protocol) tools, and a few other things (Grafana, ServiceNow, etc.) as time and participation allows. This is the result of asking the right question to problems I often face.

Conclusion

I welcome feedback that you may have and encourage you to be curious about what you’re doing and how it can affect you and those around you.

Practice asking the right question by asking questions. You may be surprised where it may lead you.


For more information about the AI Log Analyzer, visit my GitHub repository.

Using Ansible to provision VMs on AWS

Using Ansible to provision VMs on AWS

I have been asked on several occasions to show how to use Ansible to provision VMs on Amazon Web Services (AWS). This is “commoditization virtualization” on demand just by running a single playbook, which is pretty cool.

Why automation in the first place?

If your reading this article and have any experience configuring A Unix/Linux/Windows server, whether it is a mail server, web server, whatever, you know how time consuming it is to:

  • Partition the disk
  • Create user accounts
  • Install software packages and updates
  • Configure the server application
  • etc…

You had to wait for the packages to install, configure and test the application to make sure it runs and that can take a few hours.

Now, that the servers are virtual and live in a cloud machine somewhere and you now have to configure more than a dozen of them… that’s a lot of time and you have better things to do. Tools like Ansible are the answer to configuring multiple machines.

Spining up VMs using AWS web console

AWS allows you to log into a web console, choose your VM image and bring them up one at a time. It will give you ssh credentials to allow you to login to the VMs you just made and from there, use Ansible to manage and configure them. Would it be nice if you could manage the provisioning of VM instances from Ansible? Yes you can…but there is a bit of work to make it work.

I will describe the way you can do it with several step script and (in another article) programmatically.

How do I know how many VMs I have in inventory?

The challenge of dynamic inventory is the program/playbook does not know what is in the inventory ahead of time. However, if we apply the cattle not pets approach and let Ansible take care of itempotents of the VMs (it won’t clobber the VMs or exceed the constraints of number of VMs that exist) then this can make our lives easier.

Without knowing the inventory, and checking it ahead of time, you are running blind.

Programmatically is the best way to manage and track dynamic inventory and use Ansibles modules to provision VMs in AWS.

Using a playbook to provision VMS.

In my opinion this is a clunky way to use Ansible.

The problem with this way is there is no clean way to see what is the current inventory that is on AWS, you have to run a separate program before running the playbook so you can see what is currently in inventory.

When this is done you end up writing three or four separate scripts to manage this process. In the long run, this becomes difficult to maintain since you have to look at other scripts to understand what is going on.
Writing maintainable code is a key principle.

Build the playbook to provision AWS cloud services.

Playbook is set to local host.

AWS keys are needed for AWS account access.

BOTO Python API libraries are installed.

Just in case you are not aware, BOTO is the API AWS uses for programmatically managing AWS services.

AWS cloud account

Log into AWS Management Console

Under user account, select “security credentials”

In the left hand column, select user

Select the security tab.

Look for security access key.

This is what you will need for boto/ansible.

Running Vagrant

I have created a Vagrant file with an Ansible playbook for managing AWS through a Linux VM created and managed by Vagrant.

First install VirtualBox then install Vagrant.

Download from Github the Vagrant Ansible AWS files.

Change into the vagrant file directory and type:

vagrant up

It will take a while for all the dependencies to be downloaded.

Once vagrant is fully up, type:

vagrant ssh

to access the shell of the vm.

Preparing for instances.

Change directory to the ansible playbook directory and modify the following files:

aws-vars.yml

and add your AWS keys.

Provisioning AWS instances.

from the shell, type:

ansible-playbook AWS-provision.yml

to start provisioning instances in AWS.

You can watch from the AWS console the instances being provisioned.

Terminating instances

from the shell, type:

ansible-playbook AWS-terminate.yml

to terminate the ec2 instances that were provisioned in your account by the provisioning playbook.

You can watch the instances terminated from the AWS console.

This is only the beginning…

With these examples, we just created self contained machines just by running an Ansible playbook. However, we can setup a virtual container network that allows you to place in a private network such items as private networks where you have access to file servers database servers an “internal” and “external” network with a “firewall”. and more complex designs.

I may cover these examples in future articles.

In the meantime, have a great day.

The Triangle of Value

It seems like everyone wants everything these days. They want high quality products and services in the quickest amount of time at the lowest price possible. This is never the case.

I don’t recall people talking about this fundamental concept very often that is The triangle of value. It’s also known by many other names. This is a basic resource constraint when you are offering a product or service.

The triangle of value is this:

You have Time, Cost and Quality – Pick two.

You can have a low cost product very quickly but the quality suffers.

You can have a high quality product relatively quickly but it will be very expensive.

You can have a high quality product at a low cost but it will take lots of time to develop.

This is true of project management and DevOps as well. At the end of the day, these are the three elements you have to work with.

Open source products also work like this. You can have a low cost product (it’s free…so to speak) which is high quality with many features, but it gets done on donated time. It may take some time before a bug is corrected or a new features are added.

Other things that affect the “triangle of value”

Skill and Experience

Skill can reduce the amount of time it takes to make something and it can also affect quality of a product or service. Then again, you will pay more money for an individual with a higher skillset especially if you want to keep them around.

Technology

It can be something that decreases the amount of time you spend on producing a product or service. It can be in the form of automation or another catalytic process. At the end of the day, all advancements of technology are catalysts for getting more stuff out of a process. On the other hand, technology can take time and expertise to develop. This also can cost more money for better tech.

How does this understanding play into DevOps or anywhere else?

In an ideal world, you may be able to hire the brightest minds who are up to the task, have the best equipment, plenty of lead time for getting to market and an unlimited budget…but this is not the case.

It may be more like, you only could hire 2 of the 5 positions for experienced programmers (maybe your one of them) and they aren’t that bright (they just think they are), you have second hand equipment that is a few years old, your budget is a quarter of what you were promised and last week was when you had to get the project done.

That’s ok, these are the reality of the industry. You do with what you can the best you can, the same principles apply.

If you don’t have enough manpower, you make it up with overtime and finding leverage somewhere…maybe automation.

If you don’t have the latest equipment, let’s say it’s slow…you make it up by getting more of the older equipment and use them in parallel using some clever programming and networking.

If you have a short lead time, you loosen your “quality” regiment (as in hope your developers don’t make mistakes writing code by forgoing tests) to save on the time you spend in development.

DevOps CI/CD the “Holy Grail” may be a lofty goal to achieve if your resources are limited, but it is a worthwhile and doable. It may take some time to figure out how to do it and to build and train the resources to do this.

At the end of the day, what you most value is what you will get. You can’t have it all but you can choose what you can live with.

I am open to feedback and any suggestions you may have. Until then, have a good day.

Testing for DevOps – How to have 10 deploys a day versus 10 emergencies a day

Ten deploys a day was the tech talk that started DevOps as an integrated practice for software development. However, if they simply pushed out code without checking it, then the story would have been different. Without testing the code, it would be more like 10 or more Disasters a day… A down production site would result in some serious consequences such as:

  • Loss of customers
  • Loss of confidence in the company
  • Loss of employment for all who are involved

You get the pictures. Though the use of automation is a key principle of DevOps, automation is also an efficient way to multiply human error. Without efficient testing of code, CICD would be meaningless. If you write code quickly and deploy it, BUT it’s broken code, well you defeated the purpose of having software rapidly deployed and updated. Testing of code is one of the most fundamental part of DevOps there is.

But I’m a great coder and I don’t make mistakes…(or whatever is your excuse)

Well, that may be the case, however as your software product grows and more people are touching a code, the likelyhood some form of human error (that may not be yours) will be introduced into the code. It’s bad enough when you haven’t worked on the code let’s say in a couple of weeks, you don’t remember what you did. You don’t want to be called back into work on a Friday because your code changes you created bombs out on a production system… Test for it so you can sleep soundly knowing that you have a safety net in place.

Writing tests is a fundamental practice.

The fundamentals – Vince Lombarti, who is know as the winningest coaches in football once went to the lockerroom of his players after they had a major loss from a previous season and said .“..This is a football…” Part of writing good code is writing checks or tests for your code. Unfortunately, many programmers don’t write tests for their code, especially web developers who are use to just writing and fixing it as it breaks. It has been my experience that the reason many programmers don’t test code is quite simply, they don’t know how. We’ll get you pointed the right direction.

What are the various test you can do for code?

There are hundreds of different tests you can do on software, some are appropriate and some are not appropriate for testing your software. Here are a list of some of the more common tests. For the purpose of testing in a DevOps environment, there are basically two categories of test you can do for code. I’ll put particular emphasis on what are the must have tests. We’ll cover the must have tests. We’ll touch on the other tests.

Functional Tests

These are the most basic test we need to have in place for Continuous Integration.

  • Unit testing
  • Integration testing
  • Systems testing
  • Acceptance testing

Non Functional Tests

These are also important tests but are not functional test (testing key functions in software) to have AFTER you have the core functional test in place. Here are a few non functional tests.

  • Security testing – find security holes and exploits in your code.
  • Performance testing – find out how quickly your code runs.
  • Stress testing – find out how much load your application can take before it degrades or breaks.

Behavioral driven testing

This is influenced by BDD – Behavior driven development which is at the core TDD – Test driven development where development of software is defined by very specific test cases. In the case of BDD, the tests are defined in terms of a user story. This affects how you would write unit tests and acceptance tests. Testing frameworks like cucumber support this kind of writing of user stories.

Unit testing

Every bit of function you write should (must) be unit tested. It should be the most basic test that is used with any code that goes into production.

Basic principles of unit testing.

Test the smallest testable part of an application. That is a unittest.

Write your function to handle a particular error – i.e. division by zero.

test fixture – what’s needed to perform a test

Write a test case – 1/0

Write an assertion. Unit test assertion functions are very simple, they are functions that call your function and compares the returned result or state or value. For instance if you function returns a numeric value, you may use an assertion like

functionx > 0

or

assert.greater(functionx, 0)

This is dependent on your framework or library as far as if it supports different conditions for testing assertion.

These tests are best done with a testing frameworks or assertion libaries. For java, there’s JUnit, Python uses unittest, Javascript can use JestJS, MochaJS, Jasmine.

You can read more about unit testing here.

Integration testing

This is also very important. What if the guys who are responsible for writing the database interface makes a change that the busness logic guy didn’t know about or the frontend guys didn’t know about the change in the functions for business logic. Your code justs breaks and that’s not good for your mental health. You can read more about integration testing here.

Acceptance testing

This is what your customer sees and interacts with. This is probably the most time consuming part of writing tests for. Never the less, if the customer is not able to press the “pay” button, you and your company are going to lose money…and that’s not good. Your going to catch this with a good automated testing approach. These days if your testing a web app or website, use selenium or a framework that drive selenium to test the web interface with clicks and data input values. You can record your test using the Selenium IDE in Firefox or Chrome. It will take some massaging of your recording to make a good automated acceptance test.

Depending on your testing strategy, your approach and framework you will use will vary. Regardless of your approach, I’ll always recommend unit testing your code.

Tying it in to DevOps

Continuous integration is a strategy and software testing being both a strategy for instance Behavior driven development with various frameworks as tactics such as cucumber as a framework that supports Behavior driven testing.

Without software testing and a sound testing methodology, you couldn’t have Continuous integration. That is the Holy Grail of DevOps.

I am open to any comments or suggestions to this article. Until then have a good day.

The Elephant in the room – 3 key points of DevOps – Strategy, Tactics and Implemention

Elephant in the room?

When people talk about DevOps, most people don’t know what their talking about or usually they only know part of what is DevOps. The blind men describing an elephant is what comes to mind.

One blind man says he feels it’s like a rope. The second blind man said it’s like a thick branch of a tree. The third blind man said it’s a huge wall.

DevOps is the elephant in the room.

We tend to only “see” one part of the elephant and we view it from our own role or perspective.

For many people, the answer of what is DevOps tends to falls between these kinds of viewpoints:

  • To systems administrators, it’s about use of Automation techniques to manage many machines and deploy code (i.e.Ansible, Kubernetics).
  • To a developer, it is a consistent code development environment and automated integration system (i.e. Docker, Git).
  • To a project manager and stakeholders, it is an Agile development methodology for software and product development.

These answers are both right and wrong at the same time. This is where the understanding of Strategy and Tactics come into play. With this, you can recognize and understand the “elephant in the room” and how to use these tactics and strategies.

The Difference between Strategy and Tactics

The culture and management practices of DevOps can be looked at as the Strategy. The implementation DevOps in the form of tools and processes used are the Tactics. One is at a global level, the other is the “hands on” task(s). Strategies don’t change often. Tactics do change depending on the situation. It’s an important to understand these perspectives to understand what “DevOps” really is.

What is Strategy from the DevOps point of view?

Strategy is the plan to archive a long term goal.  DevOps may employ many different strategies.  Here are a few examples:

These strategies are NOT always going to be appropriate for your shop for a variety of reasons and may need to be modified or other alternative strategies may need to be used.   SCRUM vs Lean?  Up to you and your team.

What is Tactics from the DevOps point of view?

Tactics is the actual methodology of implementation of a strategy.  Depending on your own needs and circumstances, the tactics you chose depend on the strategy your seeking to fulfill. Here are some examples of of tactics:

  • Using a Git shared repository is a tactic for Continuous Integration .
  • Using  Kanban software and workflow is a tactic for Lean software development.
  • Using Ansible to setup standardized servers is the tactic for the “cattle not pets” strategy.
  • Having small team of five people consisting of people from Operations, Development and Quality Assurance is the tactic  for the two pizza team and break down silos strategy.

Tactics may change for different systems you manage.  For instance an MDM (Mobile Device Manager) platform would be the appropriate choice for managing Smartphones NOT Ansible or Chef and in this sort of environment, Docker may not be appropriate for development of the front end app.

Implementation is the key

Even after understating the Strategies of DevOps and developing the Tactics to implement DevOps, if people are not DOING the tasks to implement the tactics,  just doing what they always done, they don’t have a DevOps shop. People need to participate. The place to begin is with collaboration and creating an environment where there is trust and the people involved are actually carrying out the tactics as day to day activities. To get collaboration, people need to have ownership in the project and get recognition for their part. Create an environment where people are willing to take risks and are not afraid of failure. To do this, it’s important that failure is valued as part of the learning process instead of blaming people or finger pointing.  In fact back in the 70’s at IBM, Tom Watson Jr. recounts calling to his office a young executive made a $10 million dollar mistake. Expecting to be fired, the executive presented his letter of resignation. Tom Watson just shook his head:“You are certainly not leaving after we just gave you a $10 million education.” Learning from failure is part of IBM s culture, it should be part of every successful company s culture.

There is NO right way to do this

You can read books like The Phoenix Project or  The Goal (Which The Phoenix Project is based on) and apply the Theory of constraints, or  read any number of books written about DevOps.    However, the original talk they done on 2009 10+ Deploys Per Day , Allspaw and Hammond didn’t do something all that new as these concepts were done by others.  Lean manufacturing is where Kanban and Lean software development came from.   Site Reliability Engineering , which is in principle DevOps, was developed at Google around 2003, well before the Velocity 9 conference.  At Netflix,  they weren’t thinking about DevOps when they developed their “DevOps” shop.
I am of the opinion that you have to examine BOTH the STRATEGIES AND TACTICS of what worked at various organizations and what didn’t and focus on collaboration.  Building a collaborative environment that encourages ownership of a project, gives recognition and encourages people to take risks and not be afraid of failure is the key first step to creating a DevOps shop.  Doing this will make adopting  the “Elephant in the room” less intimidating and will put DevOps in your grasp.

Feel free to give me your comments and suggestions.

Until then, have a great day.