Category | InterView Questions
Last Updated On 18/04/2026
You’ve been coding in Python for months or even years, and you feel confident with the language. But when the interview begins, the questions often take unexpected turns—especially common python interview questions for freshers that test fundamentals in ways you didn’t anticipate.
That's the reality of Python Interview Questions in 2026. They don't just test whether you can write a loop or define a function. They test whether you understand how Python actually works under the hood, how you handle data at scale, and whether you can solve problems cleanly under pressure.
This guide covers 100+ carefully selected Python Interview Questions and Answers, organized from foundational syntax all the way through to advanced architecture and optimization questions. Whether you're a fresher preparing for your first screening call or a senior engineer targeting a lead role, this is your complete preparation reference.
Each section builds on the previous one, matching how real Python interviews are structured. Work through it in order or jump straight to the section that matches your target role.
| Section | Focus Area | Who It's For |
| Section 1 | Basic Python syntax, data types, and built-ins | Freshers and all candidates |
| Section 2 | OOP, memory management, decorators, GIL | Mid-level and above |
| Section 3 | Pandas, NumPy, data handling | Data engineering and analytics roles |
| Section 4 | Coding problems and algorithmic patterns | All technical rounds |
| Section 5 | Advanced internals, async, closures | Senior engineers |
| Section 6 | Architecture, optimization, production scripting | Lead and architect roles |
| Prep Tips | Study strategy by experience level | Everyone |
These are the questions that appear in virtually every Python screening round, regardless of role or seniority. Many of these are standard python interview questions for freshers, designed to test core fundamentals early in the process. Getting them wrong signals weak basics, which is why interviewers use them as a quick filter. Don’t underestimate them.
Python is a high-level, general-purpose programming language known for its clean syntax and readable code. It's interpreted, dynamically typed, and supports multiple programming styles, including procedural, object-oriented, and functional.
What makes it popular across so many fields comes down to a few things:
PEP 8 is Python's official style guide. It defines conventions for writing readable, consistent Python code, the kind that other developers can pick up and understand without having to decode your formatting choices.
The most important conventions to know:
PEP 8 matters in interviews because it signals professional coding habits. Interviewers notice when code is consistently formatted versus when it looks like it was written in a hurry.
This is one of the most common Basic Python Interview Questions that trips people up because the answer is "both, in a way."
When you run a Python script, two things happen:
So Python is technically compiled to bytecode first, then interpreted. The compilation step happens automatically and invisibly. This is why Python is generally described as an interpreted language, even though that's not the complete picture.
A mutable object can be changed after it's created. An immutable object cannot.
Mutable examples:
Immutable examples:
This distinction matters in practice. Immutable objects are safe to use as dictionary keys. Mutable objects are not, because their state can change after being used as a key, which would break the dictionary's internal structure.
Python's core built-in types are: int, float, str, bool, list, tuple, set, frozenset, dict, and NoneType.
Tuple over a list when:
Set over a list when:
This is a question that catches a lot of candidates who haven't thought carefully about Python's object model.
The reason this causes confusion is Python's integer caching. For small integers (typically -5 to 256), Python reuses the same object in memory, so a is b returns True even when a and b were assigned separately. For larger integers or strings, this behavior stops; two variables with the same value are different objects.
The practical rule: use == for value comparisons. The only use is when you specifically want to check identity, most commonly when checking if x is None.
Pass is a no-operation statement. It tells Python, "there's nothing here, keep moving." Python requires at least one statement in certain blocks like function bodies, class definitions, and if branches. Pass satisfies that requirement without doing anything.
Real-world situations where pass makes sense:
Both are ways to pass a variable number of arguments to a function.
Both are ways to pass a variable number of arguments to a function.
*args collects extra positional arguments into a tuple:
pythondef add(*args): return sum(args)
add(1, 2, 3) # returns 6**kwargs collects extra keyword arguments into a dictionary:pythondef display(**kwargs): for key, value in kwargs.items(): print(f"{key}: {value}")
display(name="Alice", role="Engineer")
Use *args when you don't know how many positional values will be passed. Use **kwargs when you don't know which keyword arguments will be passed. You can use both in the same function, but *args must come before **kwargs in the signature.
A function with return executes, produces a value, and exits. The next time you call it, it starts from the beginning.
A function with yield produces a value, pauses execution, and resumes from where it left off the next time it's called. A function with yield is a generator function, and calling it returns a generator object.
pythondef count_up(n): for i in range(n): yield i
gen = count_up(3)next(gen) # 0next(gen) # 1next(gen) # 2
Generators are memory-efficient because they produce values one at a time rather than building an entire list in memory upfront. This makes them particularly useful when working with large datasets or infinite sequences.
A module is a single Python file containing functions, classes, and variables. A package is a directory containing multiple modules along with an init.py file that tells Python to treat the directory as a package.
When you write import mymodule, Python searches for it in this order:
This search order is stored in sys.path, which you can inspect and modify at runtime if needed.
These questions go deeper than syntax. They test whether you understand how Python manages objects, memory, and code structure. Expect these in the second round of most Python interviews, especially for backend and data engineering roles.
Python manages memory through a private heap. A dedicated area of memory that Python controls entirely, separate from the system memory your OS manages.
Here's how the three layers work together:
Why this matters in practice: when you're building long-running services or processing large datasets, understanding this model helps you write code that doesn't quietly accumulate memory over time.
Both create a new object, but they handle nested objects differently.
Shallow copy creates a new container but keeps references to the same inner objects:
python
import copyoriginal = [[1, 2], [3, 4]]shallow = copy.copy(original)shallow[0][0] = 99print(original) # [[99, 2], [3, 4]] ………. original is affected
Deep copy creates a completely independent copy of the object and everything nested inside it:
python
deep = copy.deepcopy(original)
deep[0][0] = 99
print(original) # [[1, 2], [3, 4]] ………. original is not affected
The bug shallow copies cause is subtle and common: you think you have an independent copy, but modifying a nested object modifies the original too. Use deep copy whenever your data structure contains mutable nested objects that you need to modify independently.
A dictionary is a collection of key-value pairs with O(1) average-case lookup, insertion, and deletion, made possible by a hash table under the hood.
Before Python 3.7, dictionaries did not guarantee any particular order. The internal hash table stored items based on hash values, and the iteration order was unpredictable.
From Python 3.7 onwards, dictionaries officially maintain insertion order as part of the language specification. The implementation changed to use a compact array that preserves the sequence in which keys were added, while still maintaining the hash table for fast lookups.
This matters in practice whenever you're iterating over a dictionary, and the order of results is meaningful. For example, when building an ordered configuration or tracking the sequence of events.
A class is defined using the class keyword. The init method runs automatically when you create an instance and is used to set up the object's initial state.
python
class Employee: def __init__(self, name, role): self.name = name self.role = role def describe(self): return f"{self.name} works as a {self.role}"
emp = Employee("Alice", "Engineer")
print(emp.describe())
self refers to the specific instance the method is being called on. When you write emp.describe(), Python automatically passes emp as the first argument to describe; that's what self receives. It's not a keyword, just a strong convention. You could name it anything, but you shouldn't.
Inheritance lets one class (the child) acquire the attributes and methods of another (the parent).
python
class Animal: def speak(self): return "Some sound"
class Dog(Animal): def speak(self): return "Woof"
Python also supports multiple inheritance. A class can inherit from more than one parent:
python
class Flyable: def move(self): return "Flying"
class Swimmable: def move(self): return "Swimming"
class Duck(Flyable, Swimmable): passd = Duck()print(d.move()) # "Flying" ………. Flyable comes first in the MRO
When multiple parents define the same method, Python uses the Method Resolution Order (MRO) to decide which one wins. The MRO follows the C3 linearization algorithm, left-to-right through the parent list, depth before breadth.
The GIL is a mutex. A lock that allows only one thread to execute Python bytecode at a time, even on a multi-core machine.
It exists because CPython's memory management (specifically, reference counting) is not thread-safe. Without the GIL, two threads modifying the same object's reference count simultaneously could corrupt memory.
Practical implications:
This is one of the most important Common Python Interview Questions for backend and data engineering roles because it directly affects how you architect concurrent systems in Python
These three method types serve different purposes:
Instance method (regular):
@classmethod:
@staticmethod:
pythonclass Date: def __init__(self, day, month, year): self.day = day self.month = month self.year = year
@classmethod def from_string(cls, date_string): day, month, year = map(int, date_string.split('-')) return cls(day, month, year)
@staticmethod def is_valid_year(year): return year > 0
A decorator is a function that takes another function as input, wraps it with additional behavior, and returns the wrapped version. The @ syntax is just shorthand for passing the function through the decorator.
python
def log_call(func):
def wrapper(*args, **kwargs):
print(f"Calling {func.__name__}")
result = func(*args, **kwargs)
print(f"Done")
return result
return wrapper
@log_call
def process_data(data):
return data.upper()
This is equivalent to writing process_data = log_call(process_data).
Real-world uses of decorators in production:
A generator is a function that uses yield to produce values one at a time, pausing between each one. Instead of building an entire list in memory and returning it, it produces each value on demand.
Why this matters for memory:
python
# This creates a list of 1 million integers in memory all at oncenumbers = [x * 2 for x in range(1_000_000)]
# This creates a generator that produces one value at a timenumbers = (x * 2 for x in range(1_000_000))
The list consumes roughly 8MB. The generator uses almost nothing. It only holds the state needed to produce the next value.
Use a generator over a list comprehension when:
A namespace is a mapping from names to objects. Python uses namespaces to keep variable names from different scopes from colliding with each other.
The LEGB rule defines the order Python searches for a variable name:
pythonx = "global"
def outer(): x = "enclosing" def inner(): x = "local" print(x) # prints "local" inner()outer()
Python searches from the inside out. Local first, then Enclosing, then Global, then Built-in. The first match wins. If no match is found anywhere, Python raises a NameError.
This section covers the questions you'll face in data engineering, data science, and analytics interviews. If you're targeting a data-focused role, this is your highest-priority preparation area. Interviewers in these rounds often hand you a dataset and ask you to work with it live, knowing the theory isn't enough, you need to be comfortable writing these operations from memory.
A DataFrame is a two-dimensional, labeled data structure. Think of it as a table with named columns and indexed rows. It's the primary data structure for working with structured data in Python.
Here's how it differs from native Python structures:
In a data pipeline, you'd use a DataFrame when you need to:
Missing values in Pandas are represented as NaN (Not a Number) for numeric data or None for object types. There are three main approaches:
dropna() removes rows or columns containing missing values:
pythondf.dropna() # drop rows with any NaNdf.dropna(axis=1) # drop columns with any NaNdf.dropna(thresh=3) # keep rows with at least 3 non-NaN values
Use this when missing data is random and the rows or columns with missing values aren't important to your analysis.
fillna() replaces missing values with a specified value or strategy:
pythondf.fillna(0) # replace NaN with 0df.fillna(method='ffill') # forward fill from previous rowdf.fillna(df.mean()) # fill with column mean
Use this when you want to preserve all rows and have a reasonable substitute value available.
Imputation replaces missing values with statistically derived values using tools like sklearn's SimpleImputer or IterativeImputer. Use this in machine learning pipelines where the quality of the replacement value matters more than simplicity.
The right choice depends on why the data is missing. If it's missing at random and the dataset is large, dropping is fine. If missing values carry meaning or the dataset is small, filling or imputing is better
There are three main methods, and they serve different purposes:
merge() works like a SQL JOIN, combining rows from two DataFrames based on matching values in one or more columns:
pythonpd.merge(df1, df2, on='user_id', how='inner')# how options: inner, left, right, outer
Use this when combining datasets that share a common key column.
join() is a shortcut for merge that works on index values by default:
pythondf1.join(df2, how='left')
Use this when your DataFrames share the same index and you want a concise syntax.
concat() stacks DataFrames either vertically (adding more rows) or horizontally (adding more columns):
pythonpd.concat([df1, df2], axis=0) # stack rowspd.concat([df1, df2], axis=1) # stack columns
Use this when combining datasets with the same structure. For example, monthly reports are being combined into an annual dataset.
Reindexing means conforming a DataFrame or Series to a new index. It's how you align data to a specific set of labels, whether those labels currently exist in the data or not.
pythondf = pd.DataFrame({'score': [85, 90, 78]}, index=['Alice', 'Bob', 'Carol'])new_index = ['Alice', 'Bob', 'Carol', 'Dave']df_reindexed = df.reindex(new_index)
In this example, Dave doesn't exist in the original DataFrame. Pandas fills the missing row with NaN by default. You can override this with a fill_value parameter.
When reindexing is useful:
For loading, NumPy's loadtxt() and genfromtxt() handle CSV files directly:
pythonimport numpy as npdata = np.genfromtxt('data.csv', delimiter=',', skip_header=1)
genfromtxt() handles missing values more gracefully than loadtxt(), making it the better choice for real-world data.
Once loaded, common operations:
python# Sort by first columnsorted_data = data[data[:, 0].argsort()]
# Filter rows where column 2 > 50filtered = data[data[:, 2] > 50]
# Reshape from 2D to 3Dreshaped = data.reshape(10, 5, -1)For very large files that don't fit in memory, load in chunks using Pandas with chunksize and convert each chunk to NumPy:pythonfor chunk in pd.read_csv('large_file.csv', chunksize=10000): arr = chunk.to_numpy() # process arr
This is one of the most direct Python Interview Questions for Data Engineer roles because the answer explains why the entire data science ecosystem is built on NumPy rather than plain Python.
Key advantages:
python# Without NumPy – slow Python loopresult = [a + b for a, b in zip(list1, list2)]
# With NumPy – fast vectorized operationresult = array1 + array2
NumPy doesn't support in-place column deletion directly. You use np.delete() to create a new array without the target column, then np.insert() or np.column_stack() to add the replacement:
pythonimport numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Delete column at index 1 (second column)arr_deleted = np.delete(arr, 1, axis=1)# Result: [[1, 3], [4, 6], [7, 9]]
# New column valuesnew_col = np.array([[20], [50], [80]])
# Insert at position 1arr_replaced = np.insert(arr_deleted, 1, new_col.flatten(), axis=1)# Result: [[1, 20, 3], [4, 50, 6], [7, 80, 9]]
The key thing to be aware of is that np.delete() and np.insert() return new arrays; they don't modify the original. If you're working with large arrays, be conscious of the memory cost of creating multiple copies during this process.
A publicly shared Google Sheet can be accessed in CSV format by modifying its sharing URL. The standard approach:
pythonimport pandas as pdsheet_id = "your_sheet_id_here"sheet_name = "Sheet1"url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"df = pd.read_csv(url)
This works because Google Sheets exposes a CSV export endpoint for publicly shared documents. No authentication is needed for public sheets.
For private sheets, you'd use the Google Sheets API with the google-auth and gspread libraries, which require OAuth2 credentials.
This is one of the most practically important Python Interview Questions for Data Engineer roles. There are several approaches depending on the file type and use case:
Chunked reading with Pandas:
pythonchunk_iter = pd.read_csv('large_file.csv', chunksize=50000)results = []for chunk in chunk_iter: filtered = chunk[chunk['value'] > 100] results.append(filtered)final_df = pd.concat(results)
Python generators for line-by-line processing:
pythondef read_large_file(filepath): with open(filepath, 'r') as f: for line in f: yield line.strip()
for line in read_large_file('large_file.txt'): process(line)
Dask for out-of-memory DataFrames: Dask provides a Pandas-like API that processes data in chunks automatically, making it suitable for datasets that are too large for RAM but too structured for line-by-line processing.
The general principle: never load more than you need at once. Read in chunks, process each chunk, and accumulate only the results you need.
Items in Series A but not in Series B:
pythonimport pandas as pd
a = pd.Series([1, 2, 3, 4, 5])b = pd.Series([3, 4, 5, 6, 7])
in_a_not_b = a[~a.isin(b)]# Result: 1, 2
Items not common to both (symmetric difference):
pythonnot_common = pd.Series(list(set(a).symmetric_difference(set(b))))# Result: 1, 2, 6, 7
You can also use numpy's setdiff1d and setxor1d for the same operations on arrays:
pythonimport numpy as npnp.setdiff1d(a, b) # in a but not bnp.setxor1d(a, b) # not common to both
The isin() approach is generally faster for large Series because it's vectorized. The set-based approach is cleaner for smaller datasets where readability matters more than raw speed.
Prepare from basics to advanced with 100 real Python interview questions covering
coding, APIs, data engineering, OOP, performance, and practical interview answers.
This is the section most candidates either love or dread. Live coding rounds test whether you can translate clear thinking into working code under pressure. The good news is that most Python Coding Interview Questions follow repeatable patterns. Once you recognize the pattern, the solution becomes much more approachable.
For each problem below, focus on understanding the approach and the reasoning behind it, not just the code.
The naive approach uses two nested loops. For each number, check every other number. This works but runs in O(n²) time.
The efficient approach uses a dictionary to store numbers you've already seen:
pythondef two_sum(nums, target): seen = {} for i, num in enumerate(nums): complement = target - num if complement in seen: return [seen[complement], i] seen[num] = i
For each number, you calculate what value you'd need to complete the pair, then check if you've seen it already. Dictionary lookups are O(1), so the overall solution runs in O(n) time with O(n) space.
The key insight: instead of looking forward for a match, store what you've already seen and check backward.
The right data structure here is a stack. Every time you see an opening bracket, push it. Every time you see a closing bracket, check whether it matches the most recent opening bracket.
pythondef is_valid(s): stack = [] mapping = {')': '(', '}': '{', ']': '['}
for char in s: if char in mapping: top = stack.pop() if stack else '#' if mapping[char] != top: return False else: stack.append(char) return not stack
The stack is empty at the end only if every opening bracket was properly closed in the right order. Time complexity is O(n), space is O(n).
The efficient approach uses a sliding window with two pointers and a set to track characters in the current window:
pythondef length_of_longest_substring(s): char_set = set() left = 0 max_length = 0
for right in range(len(s)): while s[right] in char_set: char_set.remove(s[left]) left += 1 char_set.add(s[right]) max_length = max(max_length, right - left + 1)
return max_length
The right pointer expands the window. When a duplicate is found, the left pointer shrinks the window until the duplicate is gone. This runs in O(n) time because each character is added and removed from the set at most once.
The key insight is finding a hashable representation that's identical for all anagrams of the same word. Sorting the characters works perfectly, every anagram of "eat" sorts to "aet".
pythonfrom collections import defaultdict
def group_anagrams(strs): groups = defaultdict(list)
for word in strs: key = tuple(sorted(word)) groups[key].append(word)
return list(groups.values())
The sorted characters become a tuple (tuples are hashable, lists are not) which serves as the dictionary key. All words with the same sorted key end up in the same group. Time complexity is O(n * k log k) where k is the maximum word length.
An LRU cache evicts the least recently used item when it's full. Getting O(1) for both operations requires combining two data structures:
pythonfrom collections import OrderedDict
class LRUCache: def __init__(self, capacity): self.cache = OrderedDict() self.capacity = capacity
def get(self, key): if key not in self.cache: return -1 self.cache.move_to_end(key) return self.cache[key]
def put(self, key, value): if key in self.cache: self.cache.move_to_end(key) self.cache[key] = value if len(self.cache) > self.capacity: self.cache.popitem(last=False)
Python's OrderedDict handles the linked list behavior internally. move_to_end() marks an item as recently used. popitem(last=False) removes the oldest item when capacity is exceeded.
The idea: use two pointers moving at different speeds. If there's a cycle, the fast pointer will eventually lap the slow pointer, and they'll meet. If there's no cycle, the fast pointer reaches the end.
pythondef has_cycle(head): slow = head fast = head
while fast and fast.next: slow = slow.next fast = fast.next.next if slow == fast: return True
return False
This runs in O(n) time and uses O(1) space. The key advantage over a hash set approach which would use O(n) space to track visited nodes. The fast pointer moves two steps at a time, the slow pointer moves one. In a cycle, the gap between them decreases by one each iteration until they meet.
This is a recursive problem. For each key-value pair, if the value is a dictionary, recurse deeper. If it isn't, add the accumulated key path to the result.
pythondef flatten_dict(d, parent_key='', separator='.'): items = {}
for key, value in d.items(): new_key = f"{parent_key}{separator}{key}" if parent_key else key
if isinstance(value, dict): items.update(flatten_dict(value, new_key, separator)) else: items[new_key] = value
return items
# Examplenested = {"a": {"b": {"c": 1}, "d": 2}, "e": 3}print(flatten_dict(nested))# {"a.b.c": 1, "a.d": 2, "e": 3}
The parent_key accumulates the path as you recurse deeper. When you hit a non-dict value, the full path becomes the key in the flat result.
The clean Python approach uses Counter from the collections module:
pythonfrom collections import Counter
def top_k_frequent(nums, k): count = Counter(nums) return [item for item, freq in count.most_common(k)]
Counter.most_common(k) returns the k elements with the highest counts in descending order. It uses a heap internally, giving O(n log k) time complexity, more efficient than sorting all counts when k is much smaller than n.
If you can't use Counter, the manual approach builds a frequency dictionary, then uses heapq.nlargest():
pythonimport heapq
def top_k_frequent_manual(nums, k): freq = {} for num in nums: freq[num] = freq.get(num, 0) + 1 return heapq.nlargest(k, freq, key=freq.get)
The mathematical approach uses the fact that the sum of integers from 1 to N equals N * (N + 1) / 2. The difference between that expected sum and the actual sum of your list is the missing number.
pythondef find_missing(nums): n = len(nums) + 1 expected_sum = n * (n + 1) // 2 return expected_sum - sum(nums)
This runs in O(n) time and O(1) space, no sorting, no sets, no extra memory proportional to the input size. The XOR approach is an alternative that also runs in O(n) time and O(1) space, but the sum approach is easier to explain in an interview setting.
The key here is processing the file line by line rather than loading it all into memory, combined with a counter for efficient frequency tracking:
pythonfrom collections import Counterimport re
def most_frequent_word(filepath): word_counts = Counter()
with open(filepath, 'r') as f: for line in f: words = re.findall(r'\b[a-z]+\b', line.lower()) word_counts.update(words)
return word_counts.most_common(1)[0][0]
Processing line by line means memory usage stays constant regardless of file size. re.findall() with a word boundary pattern handles punctuation and case normalization. Counter.update() accumulates counts incrementally across all lines.
For extremely large files across distributed storage, the production approach would use Apache Spark or a MapReduce pattern. But for a single large file in an interview context, this solution demonstrates the right memory-aware thinking.
This section separates mid-level candidates from senior ones. The questions here don't just test whether you know a feature exists; they test whether you understand why it exists, how it works internally, and when you'd actually reach for it in production code.
This is one of those Python Coding Interview Questions and Answers topics where a lot of candidates use the terms interchangeably and get caught out.
Here's the precise distinction:
A generator is a special kind of iterator created by a function that uses yield. Generators automatically implement iter() and next() behind the scenes.
python# Custom iterator ……….. manual implementationclass CountUp: def __init__(self, limit): self.limit = limit self.current = 0
def __iter__(self): return self
def __next__(self): if self.current >= self.limit: raise StopIteration self.current += 1 return self.current
# Generator ……….. same behavior, much less codedef count_up(limit): for i in range(1, limit + 1): yield i
Every generator is an iterator because it implements both required methods. But not every iterator is a generator. A class-based iterator like CountUp above is an iterator but not a generator. Generators are simply the most convenient way to create iterators in Python.
A context manager is an object that defines setup and teardown behavior for a block of code. The with statement handles the setup before the block runs and the teardown after it finishes, even if an exception occurs inside the block.
Under the hood, the with statement calls enter() at the start and exit() at the end.
Class-based approach:
pythonclass ManagedFile: def __init__(self, filepath): self.filepath = filepath
def __enter__(self): self.file = open(self.filepath, 'r') return self.file
def __exit__(self, exc_type, exc_val, exc_tb): self.file.close() return False # don't suppress exceptions
with ManagedFile('data.txt') as f: content = f.read()
contextlib approach (simpler for most cases):
pythonclass ManagedFile: def __init__(self, filepath): self.filepath = filepath
def __enter__(self): self.file = open(self.filepath, 'r') return self.file
def __exit__(self, exc_type, exc_val, exc_tb): self.file.close() return False # don't suppress exceptions
with ManagedFile('data.txt') as f: content = f.read()
The contextlib approach is cleaner for simple cases. The class-based approach is better when the context manager needs to maintain state across multiple uses or needs more complex exception handling logic.
Duck typing comes from the saying "if it walks like a duck and quacks like a duck, it's a duck." In Python, the type of an object matters less than whether it has the methods or attributes you need.
pythondef process(data): for item in data: print(item)
process([1, 2, 3]) # works ……….. list is iterableprocess((1, 2, 3)) # works ……….. tuple is iterableprocess("hello") # works ……….. string is iterableprocess({"a": 1, "b": 2}) # works ……….. dict is iterable
The process() function doesn't check whether data is a list or a tuple. It just tries to iterate over it. If the object supports iteration, it works. If it doesn't, Python raises an AttributeError or TypeError at runtime.
How this affects how you write functions:
Rather than writing if isinstance (data, list) checks, you write code that assumes the object has the behavior you need and handles the exception if it doesn't. This makes Python functions naturally more flexible and reusable across different input types.
The answer comes back to the GIL.
Use multithreading when:
pythonimport threading
def fetch_url(url): # I/O-bound ……….. threading works well here response = requests.get(url) return response.status_code
threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
Use multiprocessing when:
pythonfrom multiprocessing import Pool
def process_chunk(data_chunk): # CPU-bound ……….. multiprocessing gives real parallelism return [x ** 2 for x in data_chunk]
with Pool(processes=4) as pool: results = pool.map(process_chunk, chunks)
The practical rule is straightforward: I/O-bound tasks use threads, CPU-bound tasks use processes. Mixing them up is one of the most common performance mistakes in Python concurrency.
MRO is the order in which Python searches through a class hierarchy to find a method or attribute. It matters most in multiple inheritance scenarios where the same method name exists in more than one parent class.
Python uses the C3 linearization algorithm to determine MRO. The result always follows these rules:
pythonclass A: def hello(self): return "A"
class B(A): def hello(self): return "B"
class C(A): def hello(self): return "C"
class D(B, C): pass
print(D.__mro__)# (D, B, C, A, object)print(D().hello())# "B" ……….. B comes first in the MRO
You can inspect the MRO of any class using ClassName.mro or ClassName.mro(). When designing class hierarchies, it's worth checking the MRO explicitly to confirm which method will actually be called when names collide across parents.
A metaclass is the class of a class. Just as a regular class defines how its instances behave, a metaclass defines how a class itself behaves. Including how it's created, what attributes it has, and what happens when you subclass it.
In Python, the default metaclass for all classes is type. When you write class MyClass: pass, Python is effectively calling type('MyClass', (object,), {}) behind the scenes.
pythonclass SingletonMeta(type): _instances = {}
def __call__(cls, *args, **kwargs): if cls not in cls._instances: cls._instances[cls] = super().__call__(*args, **kwargs) return cls._instances[cls]
class DatabaseConnection(metaclass=SingletonMeta): pass
db1 = DatabaseConnection()db2 = DatabaseConnection()print(db1 is db2) # True ……….. same instance returned both times
When metaclasses are actually used in production:
Metaclasses are powerful but complex. For most use cases, class decorators or init_subclass() are simpler alternatives that achieve the same result with less indirection.
Late binding means Python closures look up variable values at the time the function is called, not at the time it's defined.
This produces a classic bug when creating functions inside a loop:
pythonfunctions = []for i in range(5): functions.append(lambda: i)
print([f() for f in functions])# [4, 4, 4, 4, 4] ……….. not [0, 1, 2, 3, 4]
All five lambdas reference the same variable i. By the time any of them are called, the loop has finished and i equals 4. Every lambda returns 4.
The fix is to capture the current value of i at definition time using a default argument:pythonfunctions = []for i in range(5): functions.append(lambda x=i: x)
print([f() for f in functions])# [0, 1, 2, 3, 4]
Default argument values are evaluated at function definition time, not call time. So each lambda captures its own copy of i's value at the moment it was created.
By default, Python stores instance attributes in a dictionary called dict on each object. This gives you the flexibility to add attributes dynamically but comes with memory overhead; the dictionary itself takes space, and each attribute lookup involves dictionary operations.
slots replaces dict with a fixed set of attributes defined at class creation time:
pythonclass Point: __slots__ = ['x', 'y']
def __init__(self, x, y): self.x = x self.y = y
p = Point(1, 2)p.z = 3 # raises AttributeError ……….. z is not in __slots__
Benefits of slots:
Trade-offs:
Use slots when you're creating large numbers of instances of a class with a fixed, known set of attributes. Configuration objects, data records, and coordinate or point classes are common examples.
Monkey patching means dynamically modifying a class or module at runtime. Replacing or adding attributes, methods, or behaviors after the code has been loaded.
A legitimate use case (patching in tests):
pythonimport requests
def mock_get(url): class MockResponse: status_code = 200 def json(self): return {"result": "mocked"} return MockResponse()
# In a testrequests.get = mock_get
response = requests.get("https://api.example.com/data")print(response.json()) # {"result": "mocked"}
This lets you test code that makes HTTP requests without making real network calls. Libraries like unittest.mock provide a cleaner, more controlled way to do the same thing.
Risks in production code:
The general principle: monkey patching is acceptable in tests, questionable in application code, and almost always a sign of a design problem in production systems.
Both allow a program to work on multiple things without waiting for each one to finish. But they do it in completely different ways.
Multithreading uses OS-managed threads. The OS switches between threads, and each thread can be interrupted at any point. This context switching has overhead, and shared state between threads requires locks to prevent race conditions.
Async/await uses cooperative concurrency. A single thread runs an event loop, and tasks voluntarily yield control when they're waiting for something (like a network response). There's no OS context switching and no shared state problems because only one coroutine runs at a time.
pythonimport asyncioimport aiohttp
async def fetch(session, url): async with session.get(url) as response: return await response.text()
async def main(): async with aiohttp.ClientSession() as session: results = await asyncio.gather( fetch(session, "https://api1.example.com"), fetch(session, "https://api2.example.com"), ) return results
When async/await is the better choice:
Where it offers no advantage:
The practical summary: async/await handles high-concurrency I/O more efficiently than threads, but only works well when the entire chain of calls is async-compatible.
These are the questions asked at the senior and lead engineer levels. They go beyond knowing what Python features do. They test whether you can make sound architectural decisions, write production-ready code, and reason clearly about performance, reliability, and maintainability.
The first rule of optimization is: don't guess. Profile first, then fix what the data tells you is slow.
Step 1 – Profile with cProfile:
pythonimport cProfilecProfile.run('your_function()')cProfile gives you a breakdown of how much time was spent in each function call. Focus on the functions with the highest cumulative time, not just the ones called most frequently.Step 2 – Line-level profiling with line_profiler:python# Install: pip install line_profiler# Decorate the function you want to profile@profiledef slow_function(): ...# Run with: kernprof -l -v script.py
Common bottlenecks and their fixes:
''.join(list) instead of += inside a loop, which creates a new string object on every iterationThe highest-impact optimizations in most Python scripts come from algorithmic improvements; replacing an O(n²) approach with an O(n log n) one delivers far more than any micro-optimization.
Deep imports (also called eager imports) load a module and all its dependencies at import time. This is the default Python behavior.
pythonimport pandas as pd # loads the entire pandas library immediately
Lazy imports defer the import until the module is actually needed at runtime:
pythondef process_data(filepath): import pandas as pd # only imported when this function is called return pd.read_csv(filepath)
Why lazy imports improve startup time:
In large applications, importing everything at startup can add hundreds of milliseconds before the application is ready. Lazy imports mean only the modules needed for the current operation are loaded, everything else waits.
Trade-offs:
A practical middle ground: use lazy imports for heavy optional dependencies (large ML libraries, visualization tools) and keep core imports at the top of the file.
Memory leaks in Python are less common than in languages without garbage collection, but they do happen, particularly in long-running services.
Common causes:
Tools for detection:
pythonimport tracemalloc
tracemalloc.start()# ... run your code ...snapshot = tracemalloc.take_snapshot()top_stats = snapshot.statistics('lineno')for stat in top_stats[:5]: print(stat)
tracemalloc shows you exactly which lines of code are allocating memory and how much. For more complex analysis, memory_profiler and objgraph help you visualize reference counts and identify what's holding objects in memory.
Fixing a reference cycle the GC isn't catching:
Use weakref.ref() to create weak references, references that don't increment an object's reference count, allowing it to be collected when nothing else holds a strong reference to it.
Design patterns are reusable solutions to recurring software design problems. Python's dynamic nature means some classical patterns are built into the language already, but three are worth knowing explicitly.
Singleton – Ensure only one instance of a class exists:
pythonclass DatabaseConnection: _instance = None
def __new__(cls): if cls._instance is None: cls._instance = super().__new__(cls) return cls._instance
db1 = DatabaseConnection()db2 = DatabaseConnection()print(db1 is db2) # True
Use when you need a single shared resource. Database connections, configuration managers, logging handlers.
Factory – Create objects without specifying the exact class:
pythonclass NotificationFactory: @staticmethod def create(channel): if channel == 'email': return EmailNotification() elif channel == 'sms': return SMSNotification() raise ValueError(f"Unknown channel: {channel}")
notifier = NotificationFactory.create('email')
Use when object creation logic is complex or when you want to decouple the code that uses an object from the code that creates it.
Observer – Notify multiple objects when state changes:
pythonclass EventEmitter: def __init__(self): self._listeners = {}
def on(self, event, callback): self._listeners.setdefault(event, []).append(callback)
def emit(self, event, data=None): for callback in self._listeners.get(event, []): callback(data)
emitter = EventEmitter()emitter.on('data_ready', lambda d: print(f"Processing: {d}"))emitter.emit('data_ready', {'records': 1000})
Use when multiple components need to react to state changes without being tightly coupled to the component generating those changes.
Pythonic alternatives: Many classical patterns are unnecessary in Python. Singleton can be replaced with a module-level variable. The factory can be replaced with a dictionary mapping names to classes. Strategy can be replaced with first-class functions.
Three things are required to make a Python script directly executable from the command line:
Step 1 – Add a shebang line as the first line of the script:
python#!/usr/bin/env python3
This tells the OS which interpreter to use when the file is executed directly. Using /usr/bin/env python3 rather than a hardcoded path like /usr/bin/python3 makes the script portable across machines with Python installed in different locations.
Step 2 – Make the file executable:
bashchmod +x script.py
Step 3 – Handle command-line arguments using argparse:
python#!/usr/bin/env python3import argparse
parser = argparse.ArgumentParser(description='Process a data file')parser.add_argument('filepath', help='Path to input file')parser.add_argument('--verbose', action='store_true', help='Enable verbose output')args = parser.parse_args()
print(f"Processing {args.filepath}")if args.verbose: print("Verbose mode enabled")
argparse is the right choice over sys.argv for any script with more than one argument. It automatically generates help text, validates input types, and produces clear error messages when required arguments are missing.
dict.get(key) vs dict[key]:
pythondata = {'name': 'Alice'}
print(data['age']) # raises KeyError …………. key doesn't existprint(data.get('age')) # returns None …………. no errorprint(data.get('age', 0)) # returns 0 …………. custom default
Use dict[key] when the key must exist and its absence is a genuine error. Use dict.get(key) when the key might not be present and you want a fallback value instead of an exception.
The mutable default argument bug:
pythondef add_item(item, collection=[]): collection.append(item) return collection
print(add_item('a')) # ['a']print(add_item('b')) # ['b', 'a'] …………. wait, what?print(add_item('c')) # ['c', 'b', 'a'] …………. this keeps growing
Default argument values are evaluated once when the function is defined, not each time it's called. The same list object is reused across all calls. This is one of the most consistently surprising behaviors in Python for developers who haven't encountered it before.
The fix:
pythondef add_item(item, collection=None): if collection is None: collection = [] collection.append(item) return collection
Use None as the default and create a fresh mutable object inside the function body. This pattern applies to lists, dictionaries, and sets used as default arguments.
Using print() for debugging and monitoring is fine for quick scripts. For anything running in production, Python's logging module is the right tool.
What logging provides that print doesn't:
A sensible production logging configuration:
pythonimport logging
logging.basicConfig( level=logging.INFO, format='%(asctime)s %(levelname)s %(name)s %(message)s', handlers=[ logging.StreamHandler(), logging.FileHandler('app.log') ])
logger = logging.getLogger(__name__)
logger.info("Application started")logger.warning("Config file not found, using defaults")logger.error("Database connection failed", exc_info=True)
Using __name__ as the logger name means each module gets its own logger, making it easy to trace which part of the application a log message came from. exc_info=True includes the full traceback in error log entries.
Manual implementation:
pythonimport timeimport requests
def fetch_with_retry(url, max_retries=5, base_delay=1): for attempt in range(max_retries): try: response = requests.get(url, timeout=10) response.raise_for_status() return response.json() except requests.exceptions.HTTPError as e: if e.response.status_code == 404: raise # don't retry on 404 …………. it won't resolve itself wait_time = base_delay * (2 ** attempt) time.sleep(wait_time) except requests.exceptions.ConnectionError: wait_time = base_delay * (2 ** attempt) time.sleep(wait_time) raise Exception(f"Failed after {max_retries} attempts")
Using the tenacity library for cleaner implementation:
pythonfrom tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
@retry( stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=1, max=30), retry=retry_if_exception_type(requests.exceptions.ConnectionError))def fetch_data(url): response = requests.get(url) response.raise_for_status() return response.json()
The key design decision is knowing which errors to retry. Transient errors like connection timeouts and 503 responses are worth retrying. Client errors like 400 and 404 are not; retrying won't change the outcome.
Basic pytest test:
pythondef add(a, b): return a + b
def test_add_positive_numbers(): assert add(2, 3) == 5
def test_add_negative_numbers(): assert add(-1, -1) == -2
def test_add_zero(): assert add(0, 5) == 5
Run with pytest test_file.py. pytest automatically discovers and runs any function prefixed with test_.
Mock vs stub:
Testing a function that makes API calls:
pythonfrom unittest.mock import patch, MagicMock
def get_user(user_id): response = requests.get(f"https://api.example.com/users/{user_id}") return response.json()
def test_get_user(): mock_response = MagicMock() mock_response.json.return_value = {"id": 1, "name": "Alice"}
with patch('requests.get', return_value=mock_response): result = get_user(1)
assert result["name"] == "Alice"
The patch context manager replaces requests, get with a mock for the duration of the test. The real API is never called. This makes tests fast, reliable, and independent of external services.
This is one of those Advanced Python Interview Questions where the right answer depends on team size, project complexity, and deployment environment.
Virtual environments isolate project dependencies from the system Python installation and from each other: bashpython -m venv venvsource venv/bin/activate # Linux/macOSvenv\Scripts\activate # Windows
Comparing the main dependency management tools:
Ensuring reproducible builds:
bash# With pippip freeze > requirements.txtpip install -r requirements.txt
# With poetrypoetry export -f requirements.txt --output requirements.txtpoetry install --no-root
For production deployments, always pin exact versions in your lock file and commit it to version control. The development environment should install from the lock file, not resolve dependencies fresh each time. Otherwise, a new minor version of a transitive dependency can break a production deployment without any changes to your own code.
Knowing the answers is one part of interview preparation. Showing up ready to have a real technical conversation, think through problems clearly, and communicate your reasoning is equally important. Here's a preparation strategy tailored to where you are right now.
If you're preparing for your first Python role, the foundational sections are your highest priority, and they're also where most early-career candidates lose interviews they could have won.
What to focus on:
Common mistakes to avoid:
If you're aiming for a data engineering, analytics engineering, or data science role, Section 3 and Section 4, which covers important python data science interview questions, are your core preparation areas.
What to focus on:
Specific topics that consistently appear in Python Interview Questions for Data Engineer rounds:
One preparation habit that separates strong data engineering candidates: practice narrating what you're doing as you write code. Interviewers in data roles aren't just evaluating whether your code works; they're evaluating whether you can explain your reasoning to non-technical stakeholders.
If you're preparing for a senior, lead, or architect-level role, Sections 5 and 6 are where the interview is actually won or lost. Advanced Python Interview Questions at this level don't just test knowledge; they test judgment.
What to focus on:
Topics that consistently separate senior candidates from mid-level ones:
Regardless of your experience level, a few habits consistently improve interview performance across all Python Interview Questions categories:
Python interviews in 2026 test a wide range of skills. From Basic Python Interview Questions on syntax and data types, through data handling and algorithmic problem-solving, all the way to production-level architecture and optimization thinking.
The questions in this guide cover the full spectrum of what interviewers actually assess. The strongest candidates across all six sections share one thing in common: they understand why Python features exist, not just what they do. That depth of understanding is what turns a passing interview into an exceptional one.
For data engineering candidates, Section 3 and Section 4 are your highest-priority preparation investments. For senior candidates, Sections 5 and 6 are where the real differentiation happens. For everyone, the foundational sections are non-negotiable. No amount of advanced knowledge compensates for shaky fundamentals when an interviewer starts probing.
Work through this guide section by section. Write actual code for every question you can't answer confidently from memory. Revisit any section where your answers feel surface-level. The goal isn't to memorize answers; it's to build the genuine understanding that makes any variation of these questions approachable.

Python is the language that powers Generative AI and knowing it well puts you in a strong position to build real AI systems, not just use them.
NovelVista's Generative AI Professional Certification takes your Python knowledge further, covering LLM integration, RAG pipelines, agent frameworks, and production deployment in a structured, hands-on curriculum built around what hiring managers actually look for in 2026.
Explore NovelVista's Generative AI Professional Certification today.
Author Details
Confused About Certification?
Get Free Consultation Call
Stay ahead of the curve by tapping into the latest emerging trends and transforming your subscription into a powerful resource. Maximize every feature, unlock exclusive benefits, and ensure you're always one step ahead in your journey to success.